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Signals Desigiea for Recovery Afier Ciipping— 
Ill: Generalizations 


By B. F. LOGAN, JR.* 
(Manuscript received March 26, 1982) 


Let S(6, c) be the class of all real-valued bounded functions s(t) of the form 
s(t) = g(t) + cos ct, (i) 

where g is bandlimited to [—b, b] and 0 s b<c< and such that 
(-1)* s(kx/c) > 0, k=0, +1, +2, ---, (ii) 


a condition that is always satisfied if | g(t)| <1. In earlier papers we showed 
that such functions could be reconstructed from a knowledge of their zeros in 
the interval (t — T, t + T) to within an accuracy 0(e~*"), where \ = c — b. This 
paper generalizes these results to functions of the form (i) satisfying the 
condition that s(t) have only real zeros, a condition which is weaker than (ii). 
The bounds on the accuracy of the reconstruction obtained are weaker. This 
paper also shows that every interval of length greater than 27/\, where \ = c 
— b> 0, must contain at least one zero of s(t), and that s(t) satisfies 


|s(t)| = 2°", -~ <t <a, 


where p = 2c/X. 


I. INTRODUCTION 


References 1 and 2 present various practical means for recovery of 
signals s(t) in a certain class S(b, c) from their zeros. The class S(b, 
c) consists of all real-valued bounded signals s(t) of the form 


s(t) = g(t) + cos ct, (1a) 
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where g is bandlimited to [—b, b], 0 < b <c < ©, and such that 
(-1)* s(kx/c) > 0, k = 0, +1, +2, ---, (1b) 


which is satisfied, for example, if | g(t) | <1. 
The alternation condition (1a) ensures that s(t) has only real simple 
zeros {t,}, 


k 
Sh <(R+D, k = 0, +1, #2, ---, (1c) 


i.e., the zeros of s(t) are interlaced with the zeros of sin ct. 
The recovery procedures involve operations on the so-called funda- 
mental function associated with the zeros {t,} of s, 


h(t) = J(t) — et, (2) 


where J(t) is a jump function increasing by z at each zero t;, J(0) = 
0. The linearly decreasing term —ct just offsets the growth of J(t) so 
that 


—1r < h(t) <7, —~a~<t<o, (3) 


The practicality of the recovery procedures owes to the fact, estab- 
lished in Ref. 1, that h(t) is a high-pass function, having no spectrum 
in the (angular) frequency interval (—), \), where 


A=c-—b (4) 
is the “gap frequency” associated with the class S(b, c). The basic 
recovery formula given in Ref. 1 is 

1 bs 
s(t) = Stsen s(¢)jexp[h(t)], (5) 


where hi is the Hilbert transform of h. Since h(t) is a bounded high- 
pass function, a good estimate hr(t) for A(t) can be made from the 
knowledge of h in a finite moving interval (t — T, t + T’);1.e., from the 
zeros t, in the interval (t — T, t + T), leading to an estimate sr(t), 


sn(t) = {sgn s(}explir(¢)] (6) 
such that 
eT (7) 
| s(t) a sr( )I = (1 _ en Ty2 s : 


In Ref. 2, generalizations of the basic recovery formula were devel- 
oped, showing how s(t) could be recovered from bandlimited versions 
of h(t) or, equivalently, from bandlimited versions of 


h'(t)h=7 y d(t — t) —e. (8) 
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Here we wish to extend the validity of the previous results by 
removing the alternation condition (1b), simply requiring that s(t) of 
the form (1) have only real zeros {t,}. For practical purposes we would 
also require the zeros to be simple, but here we allow each zero ¢, to 
have multiplicity mp. 

Several interesting questions now arise: 

1. What is the longest possible zero-free interval for s(t)? 

2. How many zeros, counted according to multiplicity, can s(t) have 
in a closed interval of length T? 

3. How large can | h(t) | be? 

4. How large can | s(t) | be? 

The basic question here is the third question. A bound on |h(t)| is 
needed to establish that h(t) is a high-pass function with no spectrum 
in (—A, A). In order for | h(t) | to be bounded it is necessary, of course, 
for s(t) to have a limited zero-free interval and a limited number of 
zeros in any interval of fixed length. The fourth question is one of 
corollary interest. 

Once we establish that 


—-M<h(t)<=M 


we conclude that h is high-pass, so the results in Ref. 2 remain valid 
and the reconstruction algorithm in Ref. 1 remains valid, with (7) 
replaced by 


2M 


: -\T 
| s(t) — sr(t)| <5 | s(t) | (55) = i). 


In order to obtain a bound on | h(t) | we consider 
h’(t) = a Y m6(t — t,) — ¢ (9) 


and show that h’(t) is a high-pass distribution with no spectrum in 

(—A, A). To do this we must first show that the total mass of the 

distribution in any interval of fixed length T is uniformly bounded. A 

crude bound is obtained which, together with the exponential decay 

in the upper half plane u > 0 of 
s’(té+w). 
——— - Ic, 
s(t + iu) 


establishes that h’(t) has no spectrum in (—A, A); that is, 
i) f(t)h’ (t)dt = 0 
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or 


foo] 


=o) mefr(th) = [ A(t) dt, (10) 


where f,(t) is any bandlimited function of L,; whose Fourier transform 
vanishes outside (—A, A). 

The “quadrature formula” (10), together with appropriately chosen 
f,, gives an upper bound for the longest possible zero-free interval of 
s(t). 

The fundamental function h(t), under the less restrictive condition, 
is of the form 


h(t) = J(t) — ct, (11) 


where now J(¢) is a jump function increasing by mzz at each zero t; of 
multiplicity m,. The levels of J(¢) are still multiples of 7. but we do 
not (necessarily) have J(0) = 0 as before. If, for example, s(0) # 0, 
then J(0) = nz, where n is determined by the condition that h(t) have 
zero average value. To obtain an upper bound on | h(t) | we write h(t) 
as an “unbiased” integral of h’(t). An unbiased integral of a high-pass 
function is a particular integral that is also high-pass; e.g., the unbiased 
integral of cos dt is Asin At. Owing to the spectral gap (—A, ) the 
unbiased integral may be obtained by convolution with an integrating 
kernel (see Ref. 3) [,(t) belonging to L, satisfying 


{ I(t)e"dt = Bs lw] 2d. (12) 
—2 lw 


There is, then, an equivalence class of kernels {J,(t)}. If we further 
require 


{ I,(t)dt = 0, (13) 
for example, by requiring J,(t) to be an odd function, then 
h(t) =a ¥ mt — th). (14) 


We choose a particular integrating kernel J,(t) and a particular f,(t) 
[for which (10) is valid] such that 


A(t) sh) < fA, —e<t< Ow, (15) 


and hence obtain 
—c { fi(x)dx <= h(t) sc { f(x) dx, —~a~<t<o, (16) 
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The upper and lower bounds for h(t) readily give an upper bound for 
the number of zeros of s(t), counted according to multiplicity, in a 
closed interval of length T. 

It is of interest to determine an upper bound for | s(t) |, which 
amounts to determining an upper bound for h(t), the Hilbert transform 
of h(t), in (5). Now | A(t)| is not bounded, since h(t) has logarithmic 
singularities at the discontinuities of h(t), i.e., at the zeros of s(t). 
However, A(t) is bounded above. The Hilbert transform is given by 
(see Ref. 4) 


A(t) = { h(x)K,(t — x)dx, (17) 

where K) is any function of the form 
K(o =O (174) 

and 

f\(t) is bandlimited to [—), A] (17b) 
f,(0) = 1 (17c) 
{ lfa(t) | 2 < 0, (17d) 

|t|>1 |é| 


The integral in (17) is interpreted as a Cauchy principal value. With 
suitable further restrictions on f, we can write (17) as 


A(t) = “ i h’(x)L\(t — x)dx 


= » maLy(t — ty) — - f ; L\(x)dx, (18) 
where 

L(t) = { j aKy(x)dx, t <0, (18a) 

I(t) = Ly(-t). (18b) 


In accord with (17d) the integral in (18a) is absolutely convergent for 
t < 0 and we obtain (18b) by choosing f, even, and such that Ly is 
integrable. For the problem here we further choose f, so that 


L(t) < 0, —~o <t< o, (18c) 
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co 0 
i) Ly(x)dx = 2 { L,(x)dx (integrate by parts) 


lim {esto —- 7 { eKy)ae} 


=2 f f,(x)dx = — i fy(x)dx. (18d) 


Then from (18), with (18c) and (18d), we have 


Il 


A(t) < - { fi(x)dx, —w<t<o, (19) 


Here, we choose an appropriate f,, subject to (17b) through (17d) and 
(18c), so as to minimize the integral in (19). 

The results stated below are sharp only for the case 2c/A = m, an 
integer. It is not at all clear how one would proceed to improve the 
results when 2c/) is not an integer. Still, the results show that signals 
of the form (1) having only real zeros are very “nice” in that they 
behave, at worst, much like polynomials of order [2c/A] in cos At/2. 


Il. RESULTS 
Theorem 1: Let s(t) be any function of the form 


s(t) = cos ct + g(t), 
where g(t) is a bounded real-valued function bandlimited to [—b, b], 0 < 
b <c, such that s(t) has only real zeros {t,} with corresponding multi- 
plicities {m,}. Then the distribution 
h’'(t) =a y m,6(t — tk) — ¢ 
has no spectrum in (— , d), where 
A=c-—b. 
That is, for every f, in L, of the form 
» 
f(t) = { F(w)e“ dw 

—r 

we have 


i fx(t)h’ (t)dt = 0 


or 


=F mips) =f _ 0) dt 
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Theorem 2: Suppose 


» Mrfr(te) = i f,(t)dt, 


where 
Hr > 0, fag = ty 


‘holds for every f, in L, of the form 


N 
fit) = f F(w)e™ dw. 


Then 
ter — te <2a/\, k=O, +1,42,---, 
where equality holding for any k implies 


2 
=> ht 8, k = 0, +1, +2, --- 
with 
2 
me k = 0, +1, +2, 


Corollary 2: Let s(t) satisfy the hypotheses of Theorem 1. Then every 
interval of length >2x/ contains at least one zero of s(t). Furthermore, 
if s(t) has a zero-free open interval of length 27/X, then 


2 . 
~ =m, an integer (22), 


s(t) = oe {2 cos + zal ; 


where n is some integer. 
Theorem 3: Under the hypotheses of Theorem 2, the distribution 


and 





p(t) = 2 pr6(t — th) — 1 


has an (unbiased) integral p(t) satisfying 


ea) o<t<o 
oe ae ’ 


where equality holding (on either side) for one value of t implies 
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Ley = k = 0, +1, +2, --- 
Corollary 3.1: Under the hypotheses of Theorem 1 there ts a function of 
the form 
h(t) = J(t) — ct, 
where J(t) is a jump function increasing by myx at each zero ty of 


multiplicity m,, satisfying 


—<h)(t) < 


Tv 
=, —~7<t<o, 
r r 


and 


i h(t)f(tdt = 0 


for every f, in L, whose Fourier transform vanishes outside (—X, Xd), 
A=b-c>0. 


Furthermore, equality for one value of t in 
™C 

h(t) | < — 

|e) | <4 


gives the same conclusion as Corollary 2. 

Corollary 3.2: Let s(t) satisfy the hypotheses of Theorem 1 and denote 
by N(x), the number of zeros of s(t), counted according to multiplicity, 
in the closed interval [x, x + T]. Then 


eet. 2 
No(x) <= +=, (-w <x <0) 
T rv 
with equality possible if, and only if, 
2c ; 
7 =m, an integer, 


and 


(—1)” r nx\|™ ; 
j= —— =f pss 
s(t) 5 2 cos 5 + , n an integer, 


and x and x + T are zeros of s(t). 
Theorem 4: Let s(t) satisfy the hypotheses of Theorem 1. Then 
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(sé) | = 24, —~7<t<o, 


where equality for any t gives the conclusion of Corollary 2. 


Ill. PROOF OF THEOREM 1 


The basic definition of the class H(A) of high-pass functions is as 
follows. 
Definition: H(A) consists of all functions h(t) satisfying 


x+T 
Ts ‘| |h(t)|dt < M(T), —a<x¥< mH 


2. { h()f(t)dt = 0, forall f.in Ty 


whose Fourier transforms vanish outside (—X, i). 


The basic representation theorem for H(A) established in Ref. 5 is 
as follows. 
Representation Theorem: A real-valued function h(t) belongs to H(A), 
if, and only if, 


h(t) = lim A(t + tu) almost all t, 
u—-0+ 


where 
h(t + iu) = Re{H(t + iu)} 
and H(r) is analytic in the upper-half plane u > 0, 
|H(r)| =O0(e™), wu, 


x+T 
ii | h(t + w)|dt < M(T), —~<x< om, u = 0. 


Actually, H(A) could have been defined to include distributions 
having limited mass in any interval of fixed length. As it stands, the 
closure of H(A) includes such distributions. 

We consider 


s’(r) 








H'(r) =1 — 6, T=ttilu. (20) 
s(r) 
We have 
s’(r) _ =e sin cr + g’(r) 
s(r) cos cr + g(r) 


Since g(t) and g’(t) are bandlimited to [—b, b], the growth of g(r) and 
g’(r) is no faster than that of cos br. Thus, 
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s(t + tw) 








eaGD = — c tan c(t + iu) + O(e”), uu —> 00, 
=-ic+0(e™), u—->o(A=c-— dD). (21) 
We have 
aa ~ fa Set are (22) 
So 
Rigs Si ee ee (23) 
n>0 <p (t — tk)” + u 
Setting 
h’(t + iu) = Re{H’(t + iu)} (24) 
we have 
CE a ye (25) 


> ph eG, 
S(t — t)? + u? 


where the sum is absolutely convergent (p. 86, Ref. 6). 
Now let N7(x) denote the number of zeros of s(t), counted according 
to multiplicity, in the interval [x — T, x + T]. Then 


00 


M,pu u 
X (x — t,)? + u? 7 NaH) 7? uP” 
Thus 


Nolx) gaya <e +h’ (x + i), 


Setting u = T we have 


Nr(x) < 2cT + 2Th'(x + iT). (26) 
Since h’(x + iT’) = 0(e~*), T — o, we have 
N7r(x) < 2cT'+ «, for sufficiently large T. (27) 


Now 
i x+T oo miu 
"(t + wu) | dt < 2cT + i) dt —— 
x-T eee x-T iE (¢ — t,)? + = 


: u 
< 2cT' + iy G=3 4 Nr(t)dt 


< 2cT + x(2cT + €). (28) 
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It follows from (28) and (21) and the Representation Theorem that 


{ h’(t + w)f\(t)dt = 0, u> 0, (29) 


where f, is any function of ZL, given by a finite Fourier integral over 
(—A, A). The integral in (29) is absolutely convergent. 


fi nes wy peat 


<M(T)> max |fi(t)| (see p. 101, Ref. 6) 


kTSt<(k+1)T 


with the bound independent of wu. 
Thus we may write 


ed 7 °_ufi(t) _ { 
{ h'(t + iu)f,\(t)dt = 2 Mp a G4 se dt—c J f,(t)dt 


and let u — 0 to obtain the quadrature formula 
= mwfilts) = i fu(t)de. (30) 
This completes the proof of Theorem 1. 


IV. PROOF OF THEOREM 2 
We are given for the stipulated fj, 


y Mrfr(tr) = ia fi(t)dt 


Hk > 0, Ceti > tk, (31) 
and wish to prove that t+; — ty < 27/), all k. To do this we take 
N 2 
—t 
(on 
i) =— (32) 
v 2 
oe —t 


which satisfies the stipulation and 


A(t) > 0, Stas (32a) 
Tv 
<0, jis, 
je) >= 
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f fi(t)dt = 0. (32b) 


Now suppose in (31) that 


2 
ee a 5" (33) 


Setting T,, = (tn+1 + t,)/2, we have for f, given by (32), 
DY peflte -— Tr) = 0, (34) 
where in accord with (38), | t, — T,| > 7/), all k. Since 


f(t) <0 for |t| >72/A 


we conclude, since p, > 0, that t, — T,, is a zero of f,(t), greater in 
magnitude than the first; i.e., (83) implies 


lt» — T,| = (20, + 1) - Up a positive integer (all k). (35) 


But now if we apply the formula (31) to g,(t — T,), where 





cos — ft | 2 
t 2 
g(t) fil Ie 5 > 0 (36) 
pee t? i. — t? 
Xe ? 
we obtain, if (35) is true, 
{ a(t — Tr)dt = Y wr&r(ts — Tn) = 0, (37) 
which is obviously false. Hence (83) is false, i.e., we must have 
thai th S = ‘ all R. (38) 
Now suppose 
2 
tn+1 _ tn = . (39) 
With the same f/f, as before, we conclude that (39) implies 
th — Ty = (Qu, + 1)a/), U, an integer (all k). (40) 


In other words, (39) implies that (¢, — t;) is an even multiple of z/), 
all k and j. We have established (88), 
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2 
0 < tin — tk <> for all k 
Then (39) implies 





2 
h=> k+ 6, k =0, +1, +2, ---. (41) 
With ¢; given by (41) we have for 
on ON 
sin = Ct | 2 
s(t) = d ; ’ (42) 
2 
i s(t — t))dt = ¥) ursy(te — t;) = wjs,(0) 
2 : 
=We>) all j, (43) 


which completes the proof of Theorem 2. 


V. PROOF OF COROLLARY 2 
Under the hypotheses of Theorem 1 we have the quadrature formula, 


=) Mefy(te) = i f,(t)dt, 


where {t,} are the zeros of s(t) having associated multiplicities {m,}, 
My = 1, tet1 — ty > 0. It follows from Theorem 2 that tri: — t, S 27/), 
i.e., that every interval of length >27/) contains at least one zero of 
s(t). Furthermore, if tz4; — t, = 27/X for some k, then (Theorem 2) 


PA ice LL all k, 
nN 
2 
ta = me =, all k, 


i.e., the zeros have a common multiplicity 
m, = <= man integer (22). 
Thus we can have tpi; — ty = 27/d only if 2c/d is an integer m 
(necessarily =2, since \ = c — b Sc) and then only if all zeros have 


multiplicity m on a lattice of spacing 27/); i.e., if 


s(t) =A 2 cos > (t _ | ; 
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The possibilities for A and T are determined by the top frequency 
content; 


s(t) = 2A cos : (t — T) + g(t) = cos ~~ + g(t), 


i.€., 
mxrT _ 


9 nr 


2A = cos nz. 


VI. PROOF OF THEOREM 3 
The high-pass distribution 


p(t) = E moe —t)—1, pe>O, (44) 
has an unbiased integral given by 
ute) = E unie— a) — J” neers, (45) 
i | I(t) |dt < ©, (45a) 
(a I(t)e“dt = =, |w| = 2. (45b) 


It was shown in Ref. 3 that an equivalent description of an integrating 
kernel J,(t) is 


h(t) = 5 sen t ~ at), (46) 


where g) is any function bandlimited to [—A, \] such that (45a) holds. 
A useful construction for g, is as follows. 

Suppose f,(t) is bandlimited to [—A, A], even, and its analytic 
continuation f(z) is positive on the imaginary axis. Then for each £ = 


0, 
_ f,)t) 
: in| 


a(t? + &?) 





&(t; €) = (47) 


is bandlimited to [—A), A]. 
Now we define the bandlimited function g, as 


g(t) = i E(t; &)dé 
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ead et atin 
rJo P+ HE £? (t? + &*)f,(ié) 


oe — 





i (t? + E)ACE) : aD 


This gives the useful representation for — kernels 


I(t) = tfa(t) ie CEA 


‘ij dt an} dé 
o mt? + E*)A(iE) ~ t? Jo f,(ié)” 


So J, will belong to L, if (and only if) 


i VON oe oo 
iti>a | t] 


Ae) > f,(0) for §€>0, 


as is always the case for even f, having only real zeros, 


1. 2 
AGE) = ACO) TI ( + ‘') 
k Rk 


We have 





Now if 


then 


dé 1 be 
if e+ PAG) AQ) r+ Pre 








1 
= A (0) (t ¥ 0). 
Thus if (51) and (52) hold, we have 
1 [A 
ih = OO, 
POSS ROl 


with equality for some nonzero ¢, if and only if f\(t,) = 0. 
Now we choose 


Pe 2 
2 
x, 
2 





1 
ft) = 3 


(48) 


(49) 


(50) 


(51) 


(52) 


(53) 


(54) 


(55) 


so that the corresponding integrating kernel (49) is odd and satisfies 
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—f(t) = I(t) = f(é). (56) 
We then have for this J, 


foo} 


w(t) = Y pel (t — te), (57) 


—o 


and since p, > 0, we have 


u(t) = — D meflt 6) (57a) 
< : Urfy(t — tp). (57b) 
But 
E mhie— a) = Ale—ayax =F. 
Therefore, 
= < alt) <5, 0 <t<~, (58) 


We have equality holding on either side in (56) only for t = 2kx/), 
k # 0, with 


I,(0+) = f,(0) = 1/2 
I,(0-) = —f,(0) = —1/2. 


So equality can hold in (57a) as a limit from the left, and in (57b) as 
a limit from the right, if and only if 


th = a + 4, all k, (59) 
implying as before 
Lp = = all k. (60) 


Vil. PROOF OF COROLLARY 3.1 
We set h’(t)/c = wv’ (t) and apply Theorem 3 to obtain 


es 6 TC 
Sa =. 
x < hit) x (61) 


If f, is any function of LZ, whose Fourier transform vanishes outside 
(—A, A), then 
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‘ _ flt)h(t)dt = f _ Extt)h'(B)dt, 


where 


&(t) = i AOA — x)dx (62a) 


i lex(e)|dt < ) | Alt) [de f [I(é)|dt. ——(62b) 
Thus g, belongs to L, and its Fourier transform 
ne Sy 
Exo) = fr(w)Lr() 
vanishes outside (—A, A). Hence 
{ f(t) h(t)dt = { &(t)h' (t)dt = 0. (63) 


We could also establish this result by using the Representation Theo- 
rem for high-pass functions in Section III, and identifying h(t) as 


h(t) = jim Re{H(t + iw)}, (64) 
where 
H(r) = i log G(r) (64a) 
and 
G(r) = 2e”"s(r) = 1 + 2e"g(r) + e%7 (65) 
G(t + iu) = 1 + O(e~”), u —> 00, (65a) 


Here H(z) is a particular integral of H’(7) defined in (20), viz., 


H(r) = - a H’(z)dz. (66) 
However, bounds on h(t) are not readily available from the represen- 
tation (64) or (66), but more so from the unbiased integral of h’(t). 
The conditions for equality are argued as before in the proof of 
Corollary 2. 


VII. PROOF OF COROLLARY 3.2 
We have 


u(t) = - h’(t) = : » m,6(t — tp) — 1. (67) 
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Then 


x+T+0 
{ a’ (t)dt = w(x + T + 0) — w(x — 0). (68) 
x—-0 

Now the number of zeros of s(¢) in [x, x + T] is 

Nr(x)= Ym. (69) 
RixStypSx+T 
Thus 
x+T+0 Z 
if p’(t)dt = r Nyx) — T. (70) 
x-0 


According to Theorem 3 


= “is 
—< <<, 
x u(t) 
So 
ENG ee: 
Cc nN 
or 
2c Cc 
Nr(x) <—+-T. (71) 
A wT 


The implications of equality are argued again as in Corollary 2. 


IX. PROOF OF THEOREM 4 
We have [cf. (64) and (65)] 


h(t) + ih(t) = lim i log{e®“+™2s(t + iu)}, (72) 
u—0+ 
where fh is the Hilbert transform of h. 
Then 
{2 
[s(t)| = ae (73) 
To obtain an upper bound on h we use (18); i.e., 
Alt) = Y miLa(t - 4) - * i, L(x)dx, (74) 
where 
t 
I(t) = if fi(x)dx, t<0, (74a) 
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L(t) = L,(—t) (74b) 
f, is bandlimited to [—), A] (74c) 
with f,(0) = 1, 
A(t) = fi(-t), 


fee} 


and such that : | L(t) | dt < 0, 
We want an upper bound on A(t), say h(0). Since {m,} are positive 
(integers), we would like 
L(t) < 0, —~<t<o, (75) 
so that 


A(t) < - = i] Ly(x)dx, —~a<t<o, (76) 
Tv 00 
Then equality, e.g., 
h(0) = —< i Ly(x)dx, 
Tv —00 


would imply L,(—t,) = 0, all k. 

We expect the lattice distribution is extremal again, requiring L(t) 
to vanish with (cos \t/2)? in order to obtain the least upper bound for 
h. We need an alternate construction for L,. To obtain this we write 


ae AO ax, oss'6, 


. 1 0 
~toge+ {| LH AG g. [AO ay 
t x 1 Xx 


Then, since L) is even we have 
[y(t) = log|¢| + Fy(¢), (77) 


where F\ is an even entire function of exponential type \ such that 
[,(t) is absolutely integrable, 


F(t) = | aD x - J AD ax, (77a) 


i=l 

ores 

Now we obtain an alternate construction for F). 
We suppose that g) is an even entire function of exponential type 

X, positive on the imaginary axis and define for each £ = 0 


Fx(t) = (77b) 
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&(t) 
| ee 
: ( a0) g 


G,(t; —) = epee = t+ (78) 


Then we set 


r= | Gy(t; &)d 


“ ee isle sr 
Flats: ta)e- oe (t? + &*)g,(i£) 


d 
= —log|t| — g,(¢) 3 Few (79) 








and then 
[y(t) = log|t| + Fy(t) = —gy(t) it Eo ee (80) 
o (t* + &*)g(ié) 
We have 
_ &t) " &dé 
L(t) t? A git)’ t— ©, (81) 
So Ly will be absolutely integrable if (and only if) 
dt 
if |g(t)| =<. (82) 
|t|>1 t 
The particular L, we want is 
L(t) = —cos? — : (83) 


0 


(t? + &*)cosh? — (2 + Eeosh? 


We can evaluate the integral of L, indirectly by ene 


2c 
» 


m, 
1 F 
s(t) = 9 (2 cos x) : 


Then 


log 2|s(t)| = 2 mLy(t — ty) — 2 { Ly(x)dx, 


where 


th = (2k + 1) ~ 
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log 2}s(0)| = — Za Ly(x)dx, 


mlog 2=-—- Mes { Ly(x)dx, 
Qa —0 


i.e., for L, given by (83) we have 
‘ 2 
i L,(t)dt = — a log 2. (84) 
Thus we have from (84) and (76) 
A 2c 
h(t) < a log 2, —-x~a~<t<om, (85) 


with equality attainable only for the lattice distribution, giving 
|s(t)} < 22-1 —w~<t<o, (86) 


where equality for any t implies the conclusion of Corollary 2, by the 
same argument. 
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Click Modulation 
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(Manuscript received March 26, 1982) 


In this paper we show how one may determine a sequence of equal intensity 
impulses or clicks 


Tv y O(t — tr) 


such that a desired bandpass signal, f’(t), may be obtained by filtering the 
clicks; i.e., 


f'(t)=a Y K(t — te), 
where K(t) is the impulse response of a suitable bandpass filter. The {t,} are 


found as the zeros of a bandlimited signal s(t), where if f(t), the bandpass 
signal whose derivative is f’(t), is sufficiently small, we also have 


f(t) = 2 q(x)K(t — x)dx, 


where q(x) is a square wave simply related to s(t). 


l. INTRODUCTION 


Click modulation describes a sort of pulse-position modulation 
leading to a sequence of equal-intensity impulses or clicks, 


7 y 6(t — tz) (1) 


* AT&T Bell Laboratories. 
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such that a desired signal, say f’(t), may be obtained by filtering the 
clicks; i.e., 


f(t) =" E Ke ty. (2) 


Here we suppose that f’(t) is a bandpass signal with spectrum confined 
to [A, w] and [—y, —A], 0 < A < p < ©, and K(t) is any function 
satisfying 


[o iK@lae<- (3a) 
f K(t)dt = 0 (3b) 
> _ EY Shee) ASwSu 
Mans de le ae ee <w<-) 
= 0, |w| >a. (3c) 


The filter characterized by K(t) is required to reproduce f’ (t), reject 
dc, and reject frequencies greater than a, where a is some specified 
frequency (a > pz). (See Fig. 1.) In other words, we are supposing some 
constant c, such that the “Fourier transforms” of f’(t) and h’(t) agree 
over (—a, a), where 


h’(t)=7 y d(t — th) — ¢. (4) 


The distribution h’(t) then has no spectrum in (—A, A) nor in the 
guard band (p, «) and its reflection. In some applications a large guard 
band may be required to ease the filtering problem, while in others, 
e.g., audio, a small guard band may be tolerable. 


Il. THE SOLUTION OF THE PROBLEM OF CLICK MODULATION 


The problem of click modulation is given f’(t), A, u, and a, to find a 
set {t,} such that (2) holds. The basis for solving this problem is found 
in Refs. 1 through 4. The {t;,} are assumed to be the zeros (real and 
simple) of a signal of the form 


s(t) = cos ct + g(t), (5) 

where g is bandlimited to [—b, b] and 
c-—b=2X>0 (5a) 
c>a>u. (5b) 
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GUARD BAND 


FREQUENCY w 


> 
Fig. 1—Filter characteristic K(w) for bandlimiting impulse train (clicks) to recover 
desired signal f’(t) of spectral support [—y, —A] U [A, pz]. 


Then the function defined by 


H(r) = i log[2e’s(r)] (6) 


is analytic in the upper half-plane of the complex variable 7 = t + iu, 
where the principal branch of log is taken; 


log(1 +z) =z+O(z”), 2z-0 


giving 
| H(t + iu)| = O(e”), u —> 0, (7) 
The function h(t).defined by 
h(t) = jim Re{H(t + iu)} (8) 
has the form 
h(t) = J(t) — ct, (9) 


where J(t) is a jump function increasing by z at each zero t, of s, and 
h is high-pass with no spectrum in (—A, J). 

We then suppose that f(t), with spectrum confined to [A, u] and 
[—A, —u], is given by 


f(t) = { h(x)K(t — x)dx, (10) 
where K satisfies (3a), (3b), and (8c). Then differentiating, we obtain 


f(t) = : h'(x)K(t — x)dx 


cs 3 K(t — ty). (11) 
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1.0 


cos ct 0 





Fig. 2— Waveform relations in click modulation. 


The relationships of s(t), h(t), f(t), h’(t), and f’(t) are illustrated in 
Fig. 2. 

Now we would like to find s(t) and hence h(t) so that (10) holds. To 
do this it is convenient to work with “analytic signals” having no 
negative frequency content, or in the terminology of Ref. 4, functions 
whose “Fourier transforms vanish over (—%, 0)”. (Refer to Fig. 3 in 
the course of the following development.) Thus we introduce 


F(t) = f(t) + if(d), (12) 


where f is the Hilbert transform of f. Then the Fourier transform of 


404 TECHNICAL JOURNAL, MARCH 1984 


S0b NOILVINGOW ADMD 


—H shack G3 i. F.T. OF f'(t) 


, ix. F.T. OF 2(t) 
as 
=H ~A 2 ii, F.T. OF f(t) 


—Bp-a-p -AQ 


OA 7 
1 
i iii, F.T. OF 
“Aw 
| | { F(t) = f(t) +7 #(t)} : 7) 
a 
oA H 2 i c 







oA HQ 


x. F.T. OF { G(t) = 
z(t) + eft Z(z) | 


2c 
xi, F.T.OF { log G(t) = 


iv. F.T. OF 1 
(Z(t) = exp [i Flt)] } rey 


2c 
xii, F.T. OF { A(t) = 


acho v. F.T. OF Kg git) 2 t Re {H(t)} = 
f t\ (FILTER CHARACTERISTIC) 26 Se = 
. oA wa 2c -Im {log G(t)}} 
—B -—a OA ua Bp a ede 


1 vi. F.T. OF c 
{ z(t) = Ky g®Z(t)} 






04% hia B 


xiii, F.T. OF { 1¥8(t—ty) 
=hA'(th+c} 
(CLICKS) 


1 
vii. F.T. OF log z(t) 1 xiv. F.T. OF K(t) 


xv. F.T.OF { CLICK FILTER 


viii. F.T. OF —% —y —AOA ua 
{¢(t) =Im {log 2(t) }} 
Xr wa —p A A m 
E> NG ai oS 7 
FREQUENCY FREQUENCY 


Fig. 3—Depiction of Fourier Transform (F.T.) relations in derivation of click modulation. 


OUTPUT = f'(t) | 


F vanishes outside the single interval [\, u]. [It is important to note 
that a bounded bandpass signal f always has a bounded Hilbert 
transform f.] Now we require that the Fourier transforms of H(t) and 
F(t) agree over (—, a), i.e., in accord with (10), 


F(t) = { H(x)K(t — x)dx. (13) 


We define 
Z(t) = exp[—iF (¢)] (14) 
and then bandlimit Z to obtain 


z(t) = { Z(x)Kap(t — x)dx, (15) 


where the filter kernel K, is absolutely integrable and 


Ki8(w) = i) K,,3(t)e “dt 


=1 for 0<sw<a 
=0 for w2B8, (8 >a). (15a) 


Thus z(t) is bandlimited to [0, 8], and the Fourier transforms of z(t) 
and Z(t) agree over (—°, a). 


Now we set 
G(t) = G(t; 6, c) = 2(t) + e?* z(t) 
me eft) 2 (t) em Hlett9) i: 2(teitt®), (16) 
where 
2c —- B2Za, (16a) 
6 is any fixed angle, (16b) 


and z(t) is the complex conjugate of z(t). Now the Fourier transform 
of z(t) vanishes outside [—8, 0]. Thus the Fourier transform of 


ellct+)>(¢) 


vanishes outside [2c — 8, 2c], and since 2c — 6 = a, the Fourier 
transforms of Z(t), z(t), and G(t) agree over (—~, a). These relations 
are depicted in Fig. 3, parts (iv), (vi), and (x). Note that 


F°(t) 


Z(t) = exp[-iF (‘)] = 1 — iF(t) - 





eae, (17) 
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and since the Fourier transform of {F(t)}” vanishes outside [nd, ny], 
Z(t) has a spectral gap (0, A). 
We may write (16) as 


G(t) = 2eile* s(t), | (18) 


where 
s(t) = ; [2z(eje he) + Z(t et] (19) 


is real-valued and bandlimited to [—c, c], and is of the form 
s(t) = cos(ct + 6) + g(t), (19a) 


where g is bandlimited to [—b, b],b =c— X. 
Now we suppose s(r) has only real zeros. Then log G(7) is analytic 
in the upper half-plane. We compare it with 


log Z(r) = —iF (7), 


which is certainly analytic in the upper half-plane. Since the Fourier 
transforms of Z(t) and G(t) agree over (—~, a) and both log Z(7) and 
log G(r) are analytic in the upper half-plane, it follows from the theory 
in Ref. 4 that the Fourier transforms of log G(t) and log Z(t) also 
agree over (—©, «); i.e., the Fourier transforms of 


H(t) = i log G(t) 
and 
F(t) = i log Z(t) 


agree over (—°, a) provided s(r), the analytical continuation of s(t) 
given by (19), has only real zeros {t,}. We also require the zeros to be 
simple so that h(t) = Re{H(t)} has the form (9). It was shown in Ref. 
5 that a sufficient condition for s of the form (19), (c > 6/2), to have 
only real simple zeros is that z(7) be zero-free in the (closed) upper 
half-plane, Im 7 = 0. 

Thus if z(7), the analytic continuation of z(t) obtained by bandlim- 
iting Z(t) where Z(t) = exp[—iF(t)], is zero-free in the upper half- 
plane Im 7 2 0, then log z(r) is analytic in the upper half-plane and 
then the Fourier transforms of 


H(t) =i log G(é), 
F(t) = 1 log Z(t), 
and 


i log 2(t) 


CLICK MODULATION 407 


agree over (—%, w), which means that the Fourier transforms of the 
real (imaginary) parts of these functions agree over (—a, a). 
Writing 


2(t) = A(t)e’*™, (20) 
where 
A(t) = |z(t) |, 
o(t) =Im{log z2(t)} = phase 2(t), 


we have the Fourier transforms of h(t), f(t), and —¢(t) agreeing over 
(—a, a) whenever z(r) is zero-free in the upper half-plane, Im 7 2 0. 
Using (20), we may write 


s(t) = A(t)cos[ct + 6 — o(t)]. (21) 
Thus the zeros t; of s(t) are the zeros of the phase-modulated cosine, 
@(t) = cos[ct + 6 — ¢(t)], (22) 


where ¢(t) is the phase function of an analytic signal z(t), with z(t) 
bandlimited to [0, 8] and z(t + iu) zero-free for u = 0, such that the 
Fourier transforms of —¢(t) and f(t) agree over (—a, a), (u< a < 8), 
and 


a+ 8B 
ae 





C2 


Now ¢(t) is not bandlimited, but 
—o(t) = f(t) + e(é), (23) 


where e(t) is high-pass with no spectrum in (—a, a) and |e«| may be 
small compared to |f| if | f| is small, or if a, 8, and c are large compared 
to pw. 

We have noted that the Fourier transforms of —¢(t), f(t), and h(t) 
agree over (—a, a). It is interesting to observe that 


h(tr) = —o(te) (24) 


provided we take 
1 
h(t.) = 3 [A(t.+) + h(tr—)], (24a) 
which, incidentally, follows by defining 


h(t) = lim Re{H(¢ + iu)}. 
u—0+ 
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To see (24) we consider 


log G(t) = log 2(r) = log {1 te 2(7) cor} 
2(7) 
= log{1 +U(z)], (25) 
where 
U(t) = eirlet+0—9(6)) 
and 
|UE+in)|<eP™, uz. (25a) 


We have shown elsewhere’ that 
p(t) < B/2 


so that the total phase of U(t) is an increasing function. With 
| U(t) | = 1, we have for the principal branch of the logarithm 


—1/2 < arg{l + U(t)} < 7/2. (26) 
We have U(t,) = —1 and since the phase is increasing 
arg{l + U(t,—)} = 2/2 
arg{l + U(t,+)} = —7/2. 
Thus 
Re{i log G(t,—) — i log z(t,—)} = h(t,—) + (te) = —7/2 
Refi log G(te+) — i log z(t+)} = h(tet) + b(t) = 2/2 
from which (24) follows. In fact, we have from (25) and (26) 
—1m/2 < h(t) + o(t) S 7/2. (27) 


Then (24) follows from (27) and the fact that ¢(t) is continuous with 
h(t) increasing by 7 at tp. 

The condition that z(7) be zero-free in the upper half-plane is 
difficult to quantify precisely in terms of all the parameters. Generally 
speaking, for fixed a and £, 2(7) will be zero-free in the upper half- 
plane if sup| F(t) | is sufficiently small. The problem is much the same 
as that of determining the bandwidth requirements for exponential 
single-sideband modulation.* The problem will be treated in detail in 
a future paper. Suffice it here to say that a sufficient condition is 


Ref{z(t)} > 0, —-~<t<o, (28) 


or 
—1/2< o(t) < 7/2, —-0o <t<o, (29) 
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This condition is also of interest in obtaining a square wave represen- 
tation of f(t). 


Ill. SQUARE WAVE REPRESENTATION OF f(f) 
When | ¢(t)| < 2/2, the zeros ¢t, of 
s(t) = A(t)cos[ct + 6 — ¢g(t)] (30) 
interlace with the zeros of 
sin(ct + 6). 


For simplicity of discussion, we assume 4 = 0 (consider a translate). 
Then, as shown in Ref. 1, we have 


h(kr/c)=0 k=O, +1, +2, --- (31) 
—1 < h(t) <7. (32) 
ia if we subtract from h(t) the periodic sawtooth function defined 
y 
o(t) = 1/2 — ct, 0<t<z/c, 
a(t) = o(t + x/c) 
o(0) = 0, (33) 


we obtain a square wave, 
q(t) = h(t) — o(t) = 5 {sgn s(t)}-{sgn sin ct} (34) 


(see Fig. 4). 
The Fourier series for o(t) is 
(oe) 1 ; 
a(t) = ¥, — sin 2nct; 
1” 
so the Fourier transforms of q(t) and h(t) agree over (—2c, 2c), and 
since c > a, the Fourier transforms of q(t) and f(t) agree over (—a, a). 
Thus we may filter the square wave to obtain f(t): 


f(t) = { _ I) K(E — x) dx, (35) 


where K is a reproducing kernel for f, which rejects all frequencies 
outside (—a, a). 

The square wave representation of f(t) is of practical interest. First, 
one may regard the formulas (34) and (35) as a practical way of 
demodulating s(t) after it is transmitted through a nonlinear medium. 
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h(t) 


SANNA 


q(t) = A({t)—alt) 


N|[8 


| 
NIA 


sgn s(t) 


—sgn sin ct 


q(t) = 


-> -5 {sgn s(t)} {sgn sin ct} 


Fig. 4—Sum and product representation of square wave q(t). 


That is, sgn s(t) is formed and then multiplied by sgn{sin(ct + A)}, 
where A is adjusted (to some multiple of 7) by a phase-lock loop so 
that the average value of the product is zero. Then, filtering the 
resultant square wave gives a signal proportional to f(t). In this 
application one may think of c large, with a and 6 not much larger 
than yu, the top frequency of f, so that s(t) has the character of a 
single-sideband (lower-sideband) signal with spectrum confined to [c, 
c — 8] and [—c, —(c — 8)]. 

In another application, f(t) may be thought of as an audio signal 
that is to be reproduced by driving a loudspeaker with a switching- 
type (class D) power amplifier having very good efficiency. The Sony 
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Corporation has marketed a switching-type amplifier using an approx- 
imate method of obtaining the square wave.® Their approximation is 
equivalent to taking ft; to be the zeros of [cf. (22), (23)] 


cos[ct + 6 + f(t)], 
where 
1/2 < f(t) << —7/2, 


which results in a good approximation for c > wu (in their case, c = 
2x -500 kHz) but requiring an unnecessarily high switching frequency. 
[In the Sony system the square wave is generated by clipping the sum 
of the sawtooth o(t) and the signal f(t); ie., 


5 sento(t) + fO), 


which is a conventional way of obtaining analog pulse-width modula- 
tion. This is to be compared with (34), which may be equivalently 
written 


q(t) = = sgn{o(t) + o(t)}.] 


IV. IMPLEMENTATION 


Figure 5 is a block diagram of a click modulation system, including 
the optional square wave output. The input is the bandpass signal 
f’(t + A), which is fed to a Hilbert transform network, incurring a 
delay A, to obtain f’(t). The input is correspondingly delayed to obtain 
f’(t). Then f’(t) and f ’(t) are fed to an Analytic Exponential Modu- 
lator (AEM), which furnishes the outputs 


X(t) = e/eos[ f(t)] (36) 
Y(t) = -e/sin[ f(t)], (37) 
where 
X(t) + 1Y(t) = Z(t) = exp[—-iF (t)] (38) 
and 
F(t) = f(t) + if(). (39) 


These outputs are then bandlimited with identical low-pass filters 
having unity transmission over the band (—a, a) and zero transmission 
outside the band (—8, 8) to obtain the signals x(t) and y(t), where 


x(t) + ty(t) = z(t). (40) 
Then these signals are multipled by cos(ct + 6) and sin(ct + @) and 
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Fig. 5—A click modulation system. 


then added to obtain the signal 


s(t) = x(t)cos(ct + 6) + y(t)sin(ct + 6), (41) 
where the carrier frequency c must satisfy 
i OTE 


2 


The level of the input signal should be adjusted so that consecutive 
zeros Of s(t) never coalesce (in loud passages), or alternatively, to use 
the square wave output, the input level should be adjusted (made 
sufficiently small) so that x(t) is always positive. 

To obtain the click output, uniform pulses are formed at the zeros 
of s(t). This may be accomplished as indicated by infinitely clipping 
s(t), then differentiating, rectifying the resulting pulses, and using 
these to trigger a one-shot multivibrator. These pulses may be filtered 
to obtain f’(t), the derivative of f(t). To obtain the square wave 
output, the clipped signal s(t), i.e., sgn s(t), is multiplied by the clipped 
sine wave, —sgn{sin(ct + 6)}. When it is filtered, the resulting square 
wave, with suitable scaling (multiplied by 7/2), will give f(t), provided 
the input level is small enough so that x(t) is always positive. 


V. THE ANALYTIC EXPONENTIAL MODULATOR 


The functions X(t) and Y(t), given by (36) and (37), may be 
generated using function generators and multipliers. However, they 
may be generated more simply in a feedback loop, which can be made 
stable, using the fact that {X(t) — 1} and Y(t) are high-pass signals 
containing no frequencies lower than i, the lower frequency of f(t). 

Differentiating (38) we obtain 


Z/(t) = iF '(t)Z(t) 
or 
X'(t) + iY’(t) = {FO — if (OURO + iY}. 
Then, equating real and imaginary parts we have 
X'(t) = FOX) + FWY) (42) 


Y’(t) = f’(t) Y(t) — f’(t)X(C). (43) 

Now integrators and multipliers can be connected so as to solve this 

pair of differential equations for X(t) and Y(t), given arbitrary func- 

tions f(t) and f(t). However, drifts and offsets would soon cause 

trouble, resulting in exponential growth of the functions. This can be 
avoided when f and f are high-pass Hilbert transform pairs. 

In the theory we have treated all signals as dimensionless quantities. 
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Now, for clarity in implementing the analog circuitry, we attach the 
dimension of volts to all the signals and write 
Z(t) = B exp[—-iF (t)/E] = X(t) + 1Y(0), (44) 
where B and E have the dimensions of volts. Then (42) and (43) 
become 
aX _ df(t) X(t) , f(t) Y(t) 
dt dt E dt E 


dy _ df(t) Ye) _ aft) X@) ue 


(45) 


All the signals here are high-pass (lower frequency = )), with the 
exception of X(t), which we write as 


X(t) = B+ X(t), (47) 


where X(t) is high-pass (lower frequency = ). Then we rewrite eqs. 
(45) and (46) as 
dX _ Xo _ B df(t) , df(t) Xolt) , df(t) Yi) 


CG @ Ed de bo ak (48) 


dY _ df(t) Y@) _ Baf(e) _ af(t) Xolt) i 
dt dt E E dt dt E 
We multiply all derivatives by some T, which has the dimensions of 
t (time), since analog circuitry for differentiating f(t) will give Tf’(t) 
in volts. 
The analog implementation of the two differential eqs. (48) and (49) 
is shown in Fig. 6. 
High-gain (negative) ac amplifiers are connected as ac integrators 
with feedback capacitors C and input resistors R, R, and (M/B)R to 
give 








1h GR iO ee eee eRe er 
RCO = yy YO ~ By TPO il (t)Xo(t) (50) 
dX _ Tf'(t) 7 
RCT = Xolt) + = Tf (t) + —f ()Y(@). (51) 


Here M is the multiplier scale factor (volts); it is the output of the 
multiplier when both inputs are 1 volt. Comparing these equations 
with (48) and (49) we see that the “normalization factor” EF introduced 
in (44) is 

RC 


E= TP (52) 
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Fig. 6—Analytic exponential modulator. 


Coupling networks employing de blocking condensers are shown at 
the inputs and outputs of amplifiers and multipliers. The time constant 
of these networks should be large compared to the reciprocal of the 
lower frequency \. Some of these networks may be incorporated in the 
ac amplifiers. The important point is that the inputs to the multipliers 
should not have dc components. Then when inputs Tf’ (t) and Tf’ (t) 
are zero (steady state), the circuit is in a quiescent condition with 
X(t) = B and Y(t) = 0. If f’(t) and f’(t) were not high-pass, and 
Hilbert transform pairs, then ac coupling could not be used, because 
the corresponding signals X(t) — B and Y(t) would not be high-pass. 
A dc operational amplifier is shown in the feedback loop, connected 
to give a gain of —1 to the inputs —X,(t) and —B. This is not necessary, 
but provides a convenient way to add B to X(t) and give a low 
impedance output for X(t). The de amplifier can be replaced by an ac 
amplifier (gain —1) and the summing of B with X;(t) incorporated 
elsewhere in the external circuitry. 
_ Switching-type multipliers may be used in the AEM if f’(¢) and 
/’(t) are replaced by binary-valued switching signals, which, if inte- 
grated, give very close approximations to f(t) and f(t). These switching 
signals may be obtained from a “delta mod”, as shown in Fig. 7. The 
resulting error is mainly high frequency in f(t) and f(t), which trans- 
lates into high-frequency error in X(t) and Y(t). This error will 
subsequently be removed by bandlimiting X(t) and Y(t) to obtain x(t) 
and y(t). 


VI. NUMERICAL EXAMPLE 


To illustrate the theory we take a simple example: 


A=1, w=2, a=2.2, B=28, c=5/2, 


1 1 
f(t) = =5 cos t + 1 cos 2t, 


1 1 
f(t) = —9 sin t + 3 sin 2t. 
We have 
“ 1 1 
f(t) = 3 cos t — 3 cos 2t 
so that 


iF(t) = f(t) — if(t) = ; eit — ; eit, 
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F(t) INO O = —f(t) + HIGH FREQUENCY 


INTEGRATOR 
INFINITE CLIPPER , 


WITH SMALL ” 


HYSTERESIS OUT SWITCHING SIGNAL ~ 7f'(t) + HIGH FREQUENCY 


A 
(SIMILAR CIRCUIT WITH f(t) IN) 


Fig. 7—Delta-mod circuit for deriving switching signals for multipliers in analytic 
exponential modulator. 


Then 
=F 2 
Z(t) = exp[-iF()] = 1 - iF) + EO 
aed geod yer! ere oer 
2 8 2! |2 8 
Ae ae 
Tate sol 
: ; 1/1 1. ; 
aes = Dit i2t — [2 _,l2t = )13t t4t 
1+nre e +i (Fe 3 ° + rer) 


Now we bandlimit Z(t) by convolution with K,,,(t) to preserve the 
spectrum in [0, a] and eliminate frequencies above £. (Here we assume 
a = 2.2 and 8 = 2.8.) We thus obtain 


ie hs 
z(t) =1+ 5 e. 


Clearly, z(t + iu) is zero-free in the upper half-plane, u = 0. Then 
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the first two terms agreeing with 


ee ee ee 
log Z(t) = —iF(t) = 5° 3 er 
i.e., the Fourier transforms of log z(t) and log Z(t) agree over (—™, a). 
Now we take 6 = 0, c = 5/2, and form 


G(t; 6, c) = 2(t) + e?“2(t) 
Lt: ; qe as 
=1+ 5 ef + e (1 + : et) 
= Lie Le 3 
=1+ ; ev + 9 ew +e™, 


Then G(r) has only real simple zeros, since z(r) is zero-free in the 
(closed) upper half-plane. We have 
G(t) = 2e°/?s(t), 


where 


Sse Woh ee a 
Brg. tng 


oe er 5 ee bes 
4 oy 0 tame) ae” 
Writing G(t) in factored form we have 
. 2 . . . . 
G(t) = (1 + e”) T] (1 — ee”)(1 — ee"), 
k=l 
where 


cos 6; = 3/4 
cos 02 = —1/2. 
Since G(r) and 2(r) are zero-free in the upper half-plane and the 


Fourier transforms of G(t) and z(t) agree over (—~, a), it follows that 


the Fourier transforms of log G(t) and log z(t) agree over (—%, a). We 
have 


2 2 
log G(t) = log(1 + e”) + ¥ log(1 — ee") + ¥ log(1 — ee") 


CLICK MODULATION 419 


or 
. 1, i : oe 
log G(t) — ef — gee rey cies ee 


i2t 
— 2(cos 0; + cos 62)e" — 2(cos 20, + cos 262) > 


i3t 


— 2(cos 36, + cos 362) = to aa 


1 it i 12t 1 i3t 3 1 i4t 1 2 1 St 
= — git — = pitt 4. — pidt 4 Lo oi aa | 
geo ge Mog ea ie” 
= 149 3, , 169 i, = 449 igs _ toll ist 
384 7-128 8.256 9-512 


= 1201 ei lot 4489 eillt as 6305 eil2t + 
10-1024 11-2048 12-4096 


We have defined 
H(t) = i log G(t) = h(t) + ih(t). 











Then 


1 1 
h(t) = —Imtlog G(t)} = — 5 sin t + ¢ sin 2t 
a 31, 
— 54 Sin 3t gq Sn ae + ---. 


This is the “meandering” sawtooth function, which increases by 7 at 
the zeros of s(t). (The waveforms in Fig. 1 correspond to the example 
here.) 

The phase ¢(t) of the analytic bandlimited signal z(t) is 


qt « ieee T .4 
g(t) = Imflog 2(t)} = 5 sin t— 3 sin 2t + 54 Sin Bt + ++, 


We see that the Fourier transforms of f(t), h(t), and —#(t) agree over 
(—a, a). Then the Fourier transforms of f’(t) and 


h'(t) =a 5 d(t — th) —¢ 


ee ee eee eee 
2 4 8 16 i 


agree over (—a, a), where {t,} are the zeros of s(t). [The Fourier series 
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of h’(t) does not converge, of course.] Thus, if K(t) is the kernel of a 
bandpass filter that reproduces frequencies between X and y, rejecting 
de and frequencies greater than a, we have 


f(t) = h'() @ KW) = S Kit 


ses ee 2t. 
9 Ss 9 s 2t. 


Writing 
2(t) = A(t)e'* 


and 


; e“G(t) = A(t)cos{ct — ¢(t)} 


s(t) = 
we see that {t,} are the zeros of the phase-modulated signal 
cos[ct — o(t)]. 
Note that ¢(t) is not bandlimited; 
$(t) = Imflog[B,,geVO*#OH, 


where B,,¢ is the bandlimiting operator defined by convolution with 
Ku a(t), so that 


—o(t) = f(t) + e(t), 
where ¢(t) is a high-pass function whose Fourier transform vanishes 
over (—a, a). Had we taken a and 8 much larger, then e(t) would have 
been extremely small, but this would have required a much larger 
c [c = (a + B/2)] in forming s(t), i.e., a much larger pulse rate in 
obtaining 
f'(t)=a y Kt — &). 
In the example here we have 


x(t) = Refz(t)} = 1+ ; cos t= 0 


and therefore 


— 1/2 < A(t) < 7/2, 
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so that the zeros of s(t) and sin ct interlace. Then [cf. (34)] with c = 
5/2, 


g(t) = — 5 {sgn s(¢)}-{sgn sin ct} = h(t) — o(t) 
1, ke; Le 2 31. 121 , 
=— 5 sini +4 sin 2t 94 Sin 3t gq 810 4t 160 2 oF 
—sin 5t+..- 


is a square wave that may be bandlimited to obtain 
1 are 
f(t) = “5 sin t + 8 sin 2t. 
If we replace f(t) by 
ul 
f(t) = —a(sin t — ri sin 2t) 
we find that 


= z,@f —1)\ 
z(t) = 1+ ae +S (a 3) 


Then 2(7) will be zero-free in the upper half-plane for 
(¥33 -— 1) V33 + 1 
SS OS 
4 4 
(—1.186141 < a < 1.68614). 


However, the real part of z(t) will be positive, i.e., the square wave 
representation will be valid, only for the reduced range 


—1.06 <a< 1.38. 
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Single-Mode Fibers 
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Dispersion, cutoff wavelength, and mode field radius requirements place 
conflicting demands on the design of single-mode fibers. Using empirical 
models that relate these requirements to the core parameters in the preform 
stage, we present a single-mode design diagram for a depressed cladding fiber 
design showing the regions of core diameter and index difference that satisfy 
the requirements. The requirements with the greatest impact on fiber yield 
are the maximum value of the zero dispersion wavelength, the maximum value 
of the cutoff wavelength, and the allowable variations in the mode field radius. 


I. INTRODUCTION 


The optical requirements on a single-mode fiber impose conflicting 
demands on the fiber design. To meet dispersion, cutoff wavelength, 
and mode field radius requirements, the fiber core diameter and index 
of refraction difference must be controlled to within small variations 
about a nominal value, and the tightness of these specifications will 
directly affect the yield when making the fiber. In this paper, we 
examine the effects of fiber requirements on yield and identify those 
requirements that have the greatest impact. 

This study will deal with a depressed cladding fiber design, whose 
index profile is shown with its nominal parameters in Fig. 1. This fiber 
design has been thoroughly studied. Considerable measurement data 
are available for this type of fiber with wide variations in fiber 
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| REFRACTIVE INDEX DIFFERENCE 





d =83 ym 
D =6.7d 
od = 125 um 
A =0.37 

AT= ant } PREFORM VALUES IN PERCENT 
At = 0,26 


Fig. 1—Refractive index profile of a depressed cladding fiber. Nominal parameters 
are shown. 


parameters about the nominal values except for the cladding index 
(A~) and clad-to-core diameter ratio (D/d), which were well controlled 
at the nominal values shown in Fig. 1. These data have been used to 
develop simple empirical models that relate the core parameters (d 
and A) to the dispersion, cutoff wavelength, and mode field radius. 
Since the index-of-refraction profiles are more easily measured for 
preforms than for fibers, core dimensions scaled from their preform 
values were used. The index of refraction measured in the fiber will 
differ from that measured in the preform,’ but as long as the preform 
values are used consistently, the validity of the empirical models 
should not be affected. Also, models based upon preform measure- 
ments are more useful for manufacturing. 

In Section II we discuss the models in detail, and in Section III we 
present the single-mode design diagrams, which show the relationship 
between system requirements and fiber parameters. 


Il. EMPIRICAL MODELS 


Chromatic dispersion, which limits the bit rate of a digital system 
operating on a single-mode fiber, passes through zero at a wavelength 
(Ao) near 1.3 um. Since the shape of the chromatic dispersion curve as 
a function of wavelength for a nearly step-index fiber is not sensitive 
to the fiber design parameters, specifying Xo is sufficient to ensure 
that the fiber dispersion is adequately low over the range of wave- 
lengths that may be used in the system. The dispersion in a single- 
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mode fiber may be predicted by numerical solutions of the field 
equations.” However, these solutions themselves give little insight into 
the dependence of dp on core index parameters. Accordingly, a set of 
values encompassing the expected range of core diameters (7 <d< 10 
um) and index differences (0.15% < A< 0.50%) were used in a program 
that solves the scalar wave equation numerically,” and the Xo for each 
combination was computed. The core parameters, d and A, and Xo 
were then used in a multiple linear regression program to find a simple, 
accurate relationship between \) and d and A. After trying several 
different powers of d and A, the following model was selected: 


do = Co + + COVA. (1) 


For example, after we compared the computed do's with eq. (1), we 
obtained a correlation coefficient (p”) of 0.996 for a family of fibers 
with nearly step-index Ge-doped cores and pure SiO, claddings 
(A~ = 0, or “matched”). Xo was calculated for a limited number of 
alternative profile shapes and dopants, and the same functional form 
with slightly different coefficients was an excellent fit to the computed 
values of \o in every case. This model is also appropriate for depressed 
cladding fibers if the cladding diameter is large relative to the core 
diameter (i.e., D/d > 5), so that the dispersion behavior is similar to 
that of a matched cladding fiber. To use this model empirically, a set 
of coefficients that characterize measured, not computed, data was 
found. A linear regression analysis of measured values of \o for a group 
of depressed-cladding fibers as a function of core diameter and delta 
gave the following best-fit coefficients:° 


Co = 1.207 
Ci a 1.933 
C, = —0.2149 


with p? = 0.94. If we use these coefficients, eq. (1) gives \p in microm- 
eters if d is expressed in micrometers and A in percent. While core 
diameter and delta were varied, AT and D/d were held at the nominal 
values, shown in Fig. 1. 

Cutoff wavelength can also be modeled. For an ideal step-index 
fiber, the theoretical cutoff wavelength is 





can p-— 
“th 9.405 a (2) 


where n is the index of refraction of the cladding. The measured cutoff 
wavelength, \,, for the standard 5-meter sample length is well char- 


FIBER OPTICS 427 


acterized by the empirical relation 


2 2 
(*) = 0.861 (*=) + 0.0006 (3) 


with p? = 0.92. The )’s and d’s are measured in micrometers. The 
measured cutoff wavelength is expected to be lower than the theoret- 
ical cutoff wavelength in depressed cladding designs since the higher- 
order mode becomes leaky and highly attenuated well below 2 
Defining the normalized frequency as 


Cth* 


wan 
V 5 2A 
then eq. (3) corresponds to a cutoff V value of approximately 2.61 
instead of the theoretical value of 2.405 for an ideal step-index 
matched-cladding fiber. 

The width of the field distribution in the core of the fiber (the mode 
field radius, or wo) is important in controlling splice loss and laser 
launching efficiency. Since the fields are nearly Gaussian,* the width 
parameter of the Gaussian function that best fits the near field 


PREFORM DELTA IN PERCENT 





CORE DIAMETER IN MICROMETERS 


Fig. 2—Constraints imposed by tolerances on Xo, A,-, and wo. The trapezoidal region 
ehent all combinations of d and A that meet the requirements placed upon Xo, dc, 
and wo. 
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adequately characterizes the field distribution. Marcuse’ has found the 
following empirical relationship: 
Wo 1.619 2.879 
2 Pig 0.65 + var + ve 
This empirical formula has been found to agree well with measured 
values. 








(4) 


Il. SINGLE-MODE FIBER DESIGN DIAGRAMS 


Consider the following system requirements as an example: 
1.800 < Xo S 1.325 um 
1.10 < A, S$ 1.25 wm 
4.30 < wo S 4.75 wm. 


Using the empirical models developed in the previous section, curves 
of constant Xo, Ac, and wo corresponding to these limits are plotted as 
a function of d and A in Fig. 2. (A somewhat similar diagram for 
matched-cladding fibers has been published by Ainslie et al.°) The 
area inside the nearly trapezoidal shaded region represents the allow- 


PERFORM DELTA IN PERCENT 





CORE DIAMETER IN MICROMETERS 


Fig. 3—Sensitivity to parameter tolerances. Changing the maximum Xp, Ac:, OF wo can 
severely alter the size of the allowable region. 
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able combinations of d and A required to meet dispersion, cutoff, and 
mode field radius requirements. Several conclusions may immediately 
be drawn. First, only the upper limit on Xo will affect yield. The lower 
limit will not be approached if the cutoff wavelength and mode field 
radius requirements are met. Second, A, > 1.10 um if all other require- 
ments are met, so the minimum ), requirement also appears to be 
unnecessary. Finally, the tolerance on wo (+5 percent) does signifi- 
cantly reduce the size of the allowable region. The effect of the various 
parameter values on the size of the allowable region is shown in Fig. 
3. 

These diagrams also provide immediate feedback before fibers are 
drawn from the preforms. If a preform profile indicates that the fibers 
drawn from the preform will not have the desired properties, then the 
preform may be scrapped to avoid drawing and measuring fibers that 
will not be acceptable. 
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Single GRIN-Lens Directional Couplers 


By F. H. LEVINSON* and S. W. GRANLUNDt 
(Manuscript received June 10, 1983) 


Two types of small, rugged, gradient-index (GRIN) rod-lens directional 
couplers have been made. One is a three-port directional coupler and the other 
is an asymmetric four-port coupler. Each coupler can have various splitting 
ratios from 1:1 to 20:1. Excess insertion loss is less than 1.0 dB, and directivity 
is greater than 30 dB for the three-port couplers, while these same figures are 
1.2 dB and 40 dB for the asymmetric four-port couplers. The couplers are 
stable between —40 and +80°C. 


I. INTRODUCTION 


Several designs for directional couplers that use gradient index 
(GRIN) rod lenses have been published; many of these are discussed 
in the review by Uchida and Kobayashi.’ We report the results of two 
multimode fiber coupler designs. The first is a three-port directional 
coupler similar to that of Nishimoto et al.,? described in Ref. 1. Our 
design, however, incorporates a single GRIN lens and a unique angled 
mirror for performing the power division. The other coupler is an 
asymmetric four-port that is useful in fiber networking applications. 
Waveguide and notched fiber manifestations of this coupler exist.*“ 
This GRIN-lens design is more flexible and useful for a wide range of — 
fibers and applications. The three-port coupler is described first since 
its fabrication, performance, and stability are similar to that of the 
asymmetric four-port. Following this, details specific to the four-port 
coupler are presented. 


* AT&T Bell Laboratories; present affiliation Bell Communications Re- 
search, Inc. ‘AT&T Bell Laboratories. 
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Il. THREE-PORT DIRECTIONAL COUPLER 
2.1 Design 


The upper portion of Fig. 1 presents the design of the three-port 
directional coupler with a 1:1 or 3 dB splitting ratio. Light from fiber 
1 passes through the quarter-pitch lens and is collimated at the end 
face. Collimated light from a GRIN lens is not normally parallel to 
the lens axis, but is usually at an angle related to the distance of fiber 
1 from the lens axis. Any change in the direction of this beam, other 
than 180 degrees, after reflection from the lens end-face mirror results 
in a positional change when the beam is refocused back to the original 
plane of incidence. In this way, the part of the light from fiber 1 that 
strikes the portion of end face covered by the fixed mirror (evaporated 
Au deposited directly on the lens) is reflected symmetrically about the 
lens axis and refocused into fiber 2. The remaining light is reflected 
off the tilted mirror and can be refocused into fiber 3 by precise 
angular adjustment of that mirror. 


2.2 Fabrication 


The assembly of the three-port device is accomplished in two inde- 
pendent steps. First, an array of three fibers is brought near the lens 
and permanently set with its position adjusted for optimum light 
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Fig. 1—Schematic diagram of GRIN-lens three-port directional coupler and photo- 
graph of a finished device. 
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coupling between the input fiber and either of the two output fibers. 
Next, a mirror is brought near the lens end face. After the lens-mirror 
joint is filled with an optical cement, which has an index similar to 
that of the fiber and GRIN lens, the mirror is angularly adjusted and 
permanently set for optimum coupling between the input fiber and 
the remaining output fiber. An angle of ~1.3° is typical for a SLW 
1.8-mm Selfoc* lens and fibers with outer diameters (ODs) of 125 um. 
An important aspect of this two-step assembly process is the inde- 
pendent alignment of the fibers, which results in optimum coupling. 
The lower portion of Fig. 1 contains a photograph of a completed 
miniature three-port directional coupler. The overall length of this 
device is 1.5 cm; however, couplers as short as 1 cm were fabricated. 

An additional step can be taken to compensate for the cement’s 
shrinkage during curing by increasing the tilted mirror’s angle and the 
fibers’ distance from the lens before curing by an amount proportional 
to the shrinkage. However, cement shrinkage during curing is small, 
and when a device is made (using 50/125-ym fiber) without compen- 
sating for cement shrinkage during cure, a degradation of at most 0.1- 
0.2 dB results. 


2.3 Performance characteristics 


A flexibility inherent in both the three-port and four-port coupler 
designs is the change in the splitting ratio that results from varying 
the area of the end-face mirrors. As Nishimoto et al. reported,’ we 
have found a fan-shaped mirror to be the best approach in order to 
minimize modal sensitivity of the coupler. Couplers have been fabri- 
cated with splitting ratios of 1:1, 5:1, 10:1, and 20:1. 

The coupler loss and crosstalk measurements were made using either 
a \ = 0.87-um or A = 1.8-um Light Emitting Diode (LED) light source 
and a calibrated optical power meter. Light from the LED was coupled 
to a length of fiber that includes a mandrel wrap to provide a steady- 
state mode excitation. This fiber was then spliced to the device fiber 
pigtail and the signal level (after a mandrel wrap) on the output fibers 
(Pout) were recorded. Following the necessary optical power measure- 
ments, the input fiber was broken to obtain the input-power zero point 
(Pin). The insertion loss of a particular channel is defined to be 10 log 
(Pou/Pin). The excess insertion loss is defined to be that portion of 
the insertion loss that is not attributable to the power division. For 
example, if a three-port coupler is designed to split the input power 
equally between the two outputs, then, ideally, each output port would 
have a 3-dB insertion loss. If each port shows a 3.8-dB insertion loss, 
then the coupler has a 0.8-dB excess insertion loss. 

The excess insertion loss of the experimental coupler models was 


* Registered trademark of Nippon Sheet Glass. 
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between 0.6 and 1.0 dB (see Table I). As a result, although couplers 
with splitting ratios higher than 20:1 are possible, there is little 
performance improvement in excess insertion loss to be realized on 
the high reflection channel over that of a 20:1 coupler. Minimum 
excess insertion loss in the coupler appears to be attributable to the 
following sources: 0.2 dB from lens aberration,’ 0.1-dB Au mirror loss,°® 
and 0.3-dB misalignment and cement curing losses. 

The directivity of the three-port coupler was measured by determin- 
ing the coupler’s insertion loss between fibers 2 and 3 (see Fig. 1), 
while the end of fiber 1 was immersed in index matching fluid. An 
average of 32 dB was found for a 1:1 splitting-ratio coupler, and the 
directivity increases with increasing splitting ratio. The dominant 
source of this crosstalk is the light from fiber 2 or 3 that is reflected 
by the fiber 1-cement interface and refocused back into fibers 3 or 2, 
respectively. A reflection also occurs at the lens-cement interface, but 
this is not properly focused and as such does not significantly contrib- 
ute to the crosstalk. Using the simple fresnel formula with a fiber 
index, n;, of ~1.46 and a cement index, n,, of ~1.56, 


R=(np- nc)/(ne + nc)? = 0.11% or 29.6 dB. 


Practical directivity values are better than this because we are making 
two passes through a 50-percent transmission mirror (3-dB coupler 
case), and hence, this provides an additional 6 dB of directivity. Some 
of our best 3-dB couplers have approached this theoretical limit of 35 
dB. 


Table |—Insertion losses for three-port directional couplers 


SPLITTING MIRROR INSERTION LOSS IN DECIBELS* 
SHAPE THEORETICAL ACTUAL* EXCESS 
1:1 wo 3.0/3.0 3.8/3.8 0.8 
5:1 () 7.8/0.8 8.8/1.7 0.9 
10:1 e 10.4/0.4 10.0/1.4 0.8 
20:1 . 13.02/0.2 13.0/1.0 0.8 


*ACTUAL LOSSES OF MORE THAN 20 DEVICES SHOW APPROXIMATELY 0.3-dB VARIATION 
ABOUT THESE MEANS. 
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The couplers can also be constructed with relatively good spectral 
independence. Longer wavelengths increase the GRIN-lens focal 
length, but this sensitivity can be reduced by optimizing for an average 
of the system wavelengths. For example, using 50-um core multimode 
fiber in a dual wavelength system at \ = 0.87 um and 1.3 um, a focus 
set for \ = 1.05-um degrades the performance of the coupler by less 
than 0.2 dB. 


2.4 Temperature stability 


Temperature stability of the coupler was investigated, and variations 
in performance of <0.2 dB were observed over the range —40 to +80 °C. 
Devices were subjected to 100 hours of cycling, 1 hour at each extreme. 
Figure 2 shows the stability of the coupler for the 1 — 2 and 1 — 3 
channels. The plotted data are from 100 cycles of a device between 
—40 and +80 °C. The hatch-shaded region is the performance of the 
fixed-mirror channel, while the dot-shaded region is the performance 
of the tilted-mirror channel. The +0.1-dB variation of the fixed-mirror 
reflection is attributed to lateral cure strains in the fiber-lens joint. 
The +0.2-dB insertion loss variation in the light reflected from the 
tilted mirror is the sum of two separable small shifts, the +0.1-dB 
variation of the fiber-lens joint is still present, and an additional +0.1 
dB is attributable to the thermal expansion and contraction of the 
cement wedge at the mirror. For a mirror bonded at a 1.3-degree angle 
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Fig. 2—Plot of coupler excess loss variation versus temperature. The hatch-shaded 


region is the coupler performance for the fixed mirror channel; the dot-shaded region is 
for the tilted mirror. 
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Fig. 3—Schematic diagram of GRIN-lens asymmetric four-port coupler. 


to a 1.8-mm diameter lens, there is a thickness variation of 41 um. If 
the cement has an average thermal expansion coefficient of 100 x 107° 
mm/mm/°C, the angle can vary by +0.6 percent, which translates to 
a ~+1-um shift of the focal point in the focal plane or a +0.1-dB 
variation with temperature. As for the improved performance at higher 
temperature, we believe it can be attributed to thermal relaxation of 
the curing stresses and cement contraction that disturbed the original 
alignments. 


II]. FOUR-PORT DESIGN 


Figure 3 gives the design of an asymmetric four-port coupler. This 
component couples light passively on a throughput [line in (Lin) > 
line out (Lout)] channel; a portion of the signal is also tapped. The 
same modes of the input fiber, L;,, which are diverted to the R 
(receiver) fiber, are available to be refilled with modes from the T 
(transmitter) fiber. This data-bus transceiver coupler allows for the 
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Table Ii—Insertion losses for four-port 
directional couplers 
Insertion Loss (dB) 





Device Channel Theoretical Actual 
1 Lin R 7.8 8.8 
T = Lou 7.8 10.3 
Lin —> Lout 0.8 1.7 
TR 46.5 
excess 1.2 
2 Lin R 3.0 3.8 
T — Lout 3.0 4.2 
Lin - Loui 3.0 4.0 
T-R 43.5 
excess 1.0 
3 Lin R 0.8 1.6 
T —_ Lout 0.8 1.8 
Lin — Lout 7.8 7.0 
T—R 42.0 
excess 0.6 


“listen while talking” function important to the Ethernet* or Carrier- 
Sense Multiple-Access with Collision Detection (CSMA/CD) local- 
area network protocol since the T and R fibers are optically isolated 
by more than 40 dB. In addition to being useful in CSMA/CD-based 
fiber networks, these couplers are also important in fiber-optic fail- 
safe nodes.’ 

The one difference in fabrication lies in the arrangement of the 
fibers on the GRIN-lens face. In the three-port coupler the fibers are 
held in a close-packed triangular array; in the asymmetric four-port 
coupler they must be held on the corners of a square (actually, a 
rhombus will do). This is because the L;, > R and T — L,y, couplings 
are achieved simultaneously by reflections from the fixed mirror when 
the four fibers are arranged symmetrically about the GRIN-lens axis 
(see Fig. 3). The Lin — Lou, coupling is made using a tilted mirror 
similar to that described in the three-port coupler. High isolation 
between the 7' and RF ports results because the remaining portion of 
the beam from fiber T is directed (by the tilted mirror) far from fiber 
R. 

Since the coupler is so very similar to the three-port coupler previ- 
ously described, it is not surprising that its performance is the same 
as far as its temperature stability and wavelength insensitivity. Excess 
insertion losses less than 1.0 dB are obtained for this coupler as well. 
Table II summarizes the performance for three different splitting 
ratios of this coupler. 

Consistent asymmetric four-port coupler performance is (at present) 


* Registered trademark of Xerox Corporation. 
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somewhat more difficult to achieve because of the simultaneous Lin > 
R and T — Ly alignment condition required by the device. This 
condition is easier to meet if a larger core fiber is used at the receiver 
port. Such large core fibers are common in receiver designs to promote 
better coupling. For the asymmetric four-port, such a fiber choice 
allows for the four fibers to be on the corners of the square with a 
lower precision and still not affect the overall excess insertion loss of 
the device. 

This occurs in the following way. Alignment of the square fiber 
array is made by optimizing light from T — L,,.. Then the light from 
Lin — R is checked. When port R is a large fiber, all that is required 
for excellent coupling is that the image of L;, falls somewhere inside 
the R fiber core area. If a 50/125-ym fiber is used at the input and an 
80/125-ym fiber is at port R, then this fiber pair can have a +15-um 
error with respect to the T/Lou pair and the corners of the square. 
The losses in Table II are for the case where all four ports are 50/125- 
um fiber. An average of 0.2-dB improvement is seen for the case of a 
large core 80/125-um fiber at the R port. 


IV. CONCLUSION 


Small, ~1-cm, three- and four-port directional couplers have been 
designed for independent alignment of the fibers, for freedom in 
splitting-ratio variation, and for a certain degree of spectral independ- 
ence. These couplers have also been tested for stable operation between 
—40 and +80 °C. 

These GRIN-lens couplers typically exhibit 0.8-dB excess insertion 
loss. The splitting ratio of the coupler can be varied by changing the 
area of a deposited Au mirror on the lens end face. Fiber-optic local- 
area network transceivers can be constructed using the asymmetric 
four-port coupler. 
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Warping Processor for Isolated Word 
Recognition 
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(Manuscript received August 1, 1983) 


Special-purpose hardware for calculating dynamic time-warp distances has 
been designed and tested utilizing technology. The Dynamic Time-Warp 
Processor (DTWP) performs all of the necessary arithmetic and decision- 
making operations for selecting a word from a given vocabulary based on log 
likelihood distance measurements. The speed limitation in previously designed 
hardware was due to programmed decision making (often referred to as 
combinatorics). The combinatorics have been implemented in hardware in 
such a way that the decisions are made in the time of several gate delays 
rather than the time of several program cycles. Thus, a dynamic time warp 
(DTW) is performed on typical 40-frame templates in less than one milli- 
second. The DTWP serves as a slave to a 16-bit microcomputer. It performs 
all of the computation and control necessary for pattern classification, and is 
now operating on the board level. The processor is now being implemented 
for very large-scale integration. All logic has been designed in 2.5 um, Com- 
plementary Metal-Oxide Semiconductor polycells and has been simulated on 
the Metal-Oxide Semiconductor Timing Simulator (MOTIS). The timing 
simulations indicate that the DTW time of 1 ms implemented at the board 
level can also be met on the integrated circuit. 


I. INTRODUCTION 


The typical isolated speech recognition system attempts to recognize 


* AT&T Bell Laboratories. 
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an unknown utterance by comparing the unknown with each of a 
number of previously stored reference templates. The recognition 
accuracy of speech is substantially increased when variations in the 
rate of speech production are taken into consideration. This is accom- 
plished by dynamically warping the time axis of the reference utterance 
to the unknown utterance so that the minimum difference is found. 
In this way the majority of temporal variation in the speech is removed 
while the underlying spectral sequential structure is preserved. We 
have used the normalize-and-warp procedure of Myers, Rabiner, and 
Rosenberg,’ with the global constraints proposed by Sakoe and Chiba.” 
The well-known Itakura algorithm has been implemented in a manner 
similar to that described by Ackenhusen and Rabiner.® The identifi- 
cation of the unknown is based on the minimum dynamic time-warp 
(DTW) distance obtained over the set of reference templates. 

Let us assume that we are given a characterization of the isolated 
word, which consists of a set of N vectors of Linear Predictive Coding 
(LPC) coefficients. The test pattern, T, is represented as: 


T = {T(1), T(2) --- T(N)}, (1) 


where the vector T(z) is a spectral (LPC) representation of the ith 
frame of the test word. In our system a set of nine autocorrelations 
constitutes the vector from which an eighth-order LPC model is 
derived. The duration of the test utterance is N frames, where each 
frame represents 45 mS of speech, and adjacent frames are spaced 15 
mS apart. 

For a given vocabulary of V words, the reference vector, R,, is 
represented as: 


R, = {R(1), R(2)...., R(M,)} (2) 


where each vector, R(j), is again a spectral representation of the 
corresponding jth frame within the reference utterance, and M, is the 
number of frames in the vth reference. 

To optimally align the time scale of the reference pattern (the 
dependent m index) to the test pattern (the independent n index), we 
must solve for a warping path function of the form: 


m = w(n) (3) 
and thereby seek to minimize the total distance 
N 
D= ¥ d{T(n), Rew(n)) (4) 
over all possible paths, w(n), within the constraints, where d(T'(n), 
R(m)) is the local distance between test frame n and reference frame 


m =w(n). This operation must be performed for each reference vector, 
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R(j), in the vocabulary. The test pattern is classified as belonging to 
the class for which the smallest accumulated DTW distance, D, is 
obtained. In addition to the standard DTW algorithm, we have em- 
ployed linear time normalization of the normalize-and-warp procedure 
described by Myers et al.,’ thus allowing the widest range of time 
alignment paths to be considered. This procedure linearly normalizes 
the test and reference utterances to a fixed length (in this case 40 
frames) before the DTW is performed. This prenormalization greatly 
simplifies the processor design since the control logic can be fixed 
rather than have to respond to variables associated with warps of 
varying length. 

Dynamic time warping has been implemented based on the Itakura‘* 
constraints as follows: 


w(1) =1 (5) 
w(40) = 40 (6) 
soma te Ee) oon g 
where accumulated distance is given by 
D(n, m) = d(T(n), R(m)) + 
min[D(n — 1, m) g(n — 1, m), 
D(n -— 1, m — 1), 
D(n — 1), m — 2)] (8) 
and 
D(1, 1) = d(T(1), R(1)). (9) 


Constraints (5) and (6) require the endpoints of the test and reference 
to match. Time alignment is performed between the endpoints con- 
strained locally by (7). Basically, constraint (7) and eq. (8) allow the 
reference utterance to be time compressed by skipping one frame for 
each test frame. Time stretching is accomplished by duplicating a 
reference frame. This constraint limits the number of times a reference 
frame may be repeated to one, as shown in Fig. 1a. The path marked 
“x” is not allowed. 

The terminal endpoint conditions must be satisfied by applying 
global constraints, as illustrated in Fig. 1b. The parallelogram con- 
straint shown forms the basic constraint requirement such that local 
constraint (7) is satisfied from either endpoint. Sakoe and Chiba” have 
found that a further global constraint limiting the deviation of the 
solution path from the diagonal to +R substantially reduces the 
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Fig. 1—Dynamic time warp for (a) local constraints, and (b) global constraints. 


number of distance calculations necessary without suffering a signifi- 
cant loss in accuracy. In this implementation R is five. 

Comparison of a test frame to a reference frame requires a measure 
of closeness in some sense. Several distance measures have been 
investigated and used for utterance comparison purposes. A method 
developed by Itakura‘* has been found to yield high recognition accu- 
racy and costs relatively little in computation. This distance function, 
often referred to as the log likelihood ratio (LLR), yields numerical 
values that are indicative of the spectral energy difference between 


444 TECHNICAL JOURNAL, MARCH 1984 


the two frames of speech. The form of the function is 
d(x, y) = In[(a, Vz a7)/(a: Vz a7)], (10) 


where ¢ refers to a test frame and r refers to a reference frame, a is a 
(1 X p + 1) vector consisting of a 1 followed by p LPC coefficients of 
a pth-order LPC model of the speech and V is the (p + 1 X p + 1) 
autocorrelation matrix of the test frame. By appropriately computing 
a set of reference coefficients and test coefficients, d(x, y) can be 
obtained as the result of a (p + 1)-point dot product as described by 
Itakura.* 

The DTW algorithm, which has traditionally been performed by a 
general-purpose processor or microprocessor, requires 360 distance 
calculations, 630 two-way comparisons, and 359 additions for eighth- 
order LPC. We have developed a finite-state machine specifically for 
DTW calculations which performs these calculations in 902.5 micro- 
seconds. The processor currently operates with a 4-MHz clock, but it 
is estimated from design analysis that the clock rate could be increased 
to 6 MHz for the board-level implementation. 

In the following section we will discuss the architecture of the 
DTWP and explain its operation. Modules of particular interest are 
the combinatoric logic and the logarithm calculation logic, which will 
be described in Sections III and IV. Some preliminary specifications 
on the VLSI implementation will be discussed in Section V. Options 
of system integration are covered in Section VI with an example of a 
multiprocessor system implementation. 


If. PROCESSOR ARCHITECTURE 


The DTWP architecture is specifically designed to implement a 
particular DTW algorithm used in the speech recognizer described by 
Ackenhusen and Rabiner.*? Both the test utterance and reference 
template contain 40 frames each of LPC features. The Itakura con- 
straints are applied so that expansion and compression ratios of the 
reference time axis with respect to the test time axis are limited to 
two to one or one half to one, respectively. Furthermore, the additional 
constraints described by Sakoe and Chiba? are applied (with R = 5) so 
that the number of distance calculations per column is limited to 11. 
In part, the high effective processing speed obtained with the DT WP 
is due to this specialization of the design. Changes in the number of 
frames to be warped, the constraints, and the Sakoe constraint (R) 
will require redesign of the DT WP. 

The DTWP is interfaced to the control microprocessor via the data 
and address buses and four control lines from an I/O port. The 
microprocessor can load LPC values into both the test memory and 
reference memory. The test and reference templates are stored as 12- 
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bit values, 9 values per frame. Thus, each template requires 4320 bits 
of memory. The number of warps to be performed by the DTWP 
depends on the number of reference patterns stored in reference 
memory. An 8-bit register is provided in the DTWP for telling the 
processor the number of patterns to warp. After this value is loaded, 
the DT WGO signal is asserted by the microprocessor and the DTWP 
takes over control of its memories. These memories are removed from 
the microprocessor bus and split so that test and reference memories 
are individually accessible. In this way, data are obtained from the 
two memories in parallel at the rate of 24 bits per clock cycle or 96 
million bits per second. 

Warps are performed every 902.5 microseconds. The DIWP may 
be interrupted between warps so that the microprocessor can obtain 
the accumulated warp distances for each reference. The DT WP keeps 
track of the best distance and the corresponding reference index, 
which can be read back by the microprocessor between warps. If the 
DTWP is not interrupted between warps, it can perform 256 warps in 
231 milliseconds. 

An LPC distance is computed by multiplying the nine values of a 
test frame with the nine values of a reference frame, accumulating the 
sum of the products, and taking the logarithm (in this case a base two 
logarithm). These operations are performed by a 12 X 12 Wallace tree 
multiplier with a 24-bit accumulator in a pipelined fashion. Referring 
to Fig. 2, which shows a block diagram of the distance calculator, the 
sequence of operation is: 

1. Apply addresses to the test and reference memories 

2. Clock data into the multiplier input registers 

3. Clock the product into the accumulator 

4. Repeat steps 1, 2, and 3 eight times 

5. Clock the logarithm result into the combinatoric logic. 

Thus, it takes 13 clock cycles to obtain the first distance, and 10 
clock cycles for each succeeding distance. The first three operations 
require nine clock cycles to compute the nine-point dot product. 
Operation (5) is performed on the tenth clock cycle. 

A diagram of a DTW calculation is shown in Fig. 1. Each point on 
the diagram indicates a distance calculation consisting of a 9-point 
dot product and its logarithm, which is the LPC distance between one 
test frame and one reference frame. After the distance is computed, 
all calculations in the processor are performed in 16-bit unsigned 
arithmetic. An accumulated distance is calculated at each point, which 
consists of the local distance (or dot product) and the minimum of 
three possible predecessor distances, as shown in Fig. la. Thus, if each 
point is calculated sequentially (from bottom to top and from left to 
right on the diagram), only the previous column of points needs to be 
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Fig. 2—Distance calculator. 


available at any one time. 

This is accomplished in the DT WP by a 14-stage shift register, as 
shown in the block diagram of Fig. 3. The boundary conditions of the 
warp are established by inserting large values, which we will call 
infinities, into the shift register at the top and bottom of each column 
of distances. These infinities will never be chosen as a minimum by 
the minimum selector for the accumulated distance calculation. Thus, 
when columns of 11 distances are processed (the middle portion of the 
warping function), two infinities must be inserted into the shift register 
after each column to separate one column from another. The number 
of stages in the shift register (14) comes from an analysis of the central 
portion of the warp. The 11 distances and two infinities occupy 13 
stages. The 14th stage is necessary to position the accumulated dis- 
tances of the previous column at the correct inputs of the three-way 
minimum selector. In this way, accumulated distances of the preceding 
column and the local distances currently being calculated are properly 
associated. 

The number of distances in a column is fewer than 11 and increases 


WORD RECOGNITION PROCESSOR 447 


FEATURE 
EXTRACTION 
(8086 MICROPROCESSOR) 








REFERENCE 
MEMORY 







TEST 
MEMORY 
360 X 12 








96K X 12 
(256 TEMPLATES) 










(OFF CHIP) 
(ON CHIP) 
MULTIPLIER (LOCAL DISTANCE, aj) 
LOGARITHM Se 
ij 
A ; 
fA MINIMUM min (A, B, C) 
SELECT 
ee eee 
14-STAGE Dr 
SHIFT a 
REGISTER MINIMUM 
i SELECT 
| 
! 
MINIMUM 
Dr 
STORE 


Fig. 3—Architecture of the dynamic time-warping processor. 


at the beginning of the warp and decreases at the end of the warp. To 
properly align the columns of distances an appropriate number of 
infinities must be inserted into the shift register after each column. 
The number of distances and the number of infinities needed for each 
column are stored in Read-Only Memory (ROM). The sequence used 
in the DT WP is shown in Table I. 

These values are critical to proper operation of the circuitry and 
cannot be usefully modified without corresponding changes in the 
surrounding hardware. Since one processor clock cycle is required to 
insert each infinity into the shift register, distance calculations are 
performed in parallel with the shifting operation. Normally, the num- 
ber of infinities to be inserted is fewer than 10, so the shifting is 
normally completed before the next distance calculation is finished. 
There are four columns of distances, however, that require 10 or more 
infinity shifts. In these cases the distance calculator enters a wait state 
after the 10th clock cycle until the shifts are completed. During a 
warp, six wait-state clock cycles are used. 

The last three stages of the 14-stage shift register deliver 16-bit 
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Table |—Sequence of distances and infinities for DTWP 


Test 
Frame 






Test 


Distances Infinities Frame 


Distances Infinities 









ie) 





22 11 2 
8 23 ll 2 
7 24 11 2 
5 25 11 Z 
4 26 ll 2 
3 eal 11 Z 
3 28 11 2 
2 29 11 2 
2 30 11 2 
2 31 11 2 
2 32 10 3 
2 33 10 3 
2 34 9 4 
2 35 9 5 
2 36 7 7 
2 37 6 8 
2 38 4 10 
2 39 3 11 
2 40 1 0 





values to the three inputs of a three-way comparator shown in the 
block diagram of Fig. 3. The first of these three comparator inputs 
corresponds to a horizontal path segment, as shown in Fig. la. This 
comparator input can be disabled by a control bit stored in an auxiliary 
shift register, which indicates if a horizontal path was taken when the 
previous column of distances was calculated. This auxiliary shift 
register is 12 stages long and is clocked at the same time as the 14- 
stage shift register. If the horizontal path input is not disabled, and it 
is found that this input gives the lowest accumulated distance, then 
the path disable bit is set in the first stage of the auxiliary shift register 
in preparation for the following column of distances. Further details 
on the three-way comparator are given in the next section. 

The output of the three-way minimum selector is added to the local 
distance from the distance calculator and the result is inserted into 
the first stage of the shift register between columns, until all 360 
distances have been calculated. At this point a 4-clock cycle sequence 
is entered, which compares the current warp distance with the best 
previously stored warp distance and updates the stored value if nec- 
essary. Also, during this 4-cycle sequence, internal registers are 
cleared, the reference index is updated, and the control logic is initial- 
ized for the next warp. Thus, a warp is performed in 3600 clock cycles 
for distance calculations, plus 6 clock cycles for wait states, and 4 
clock cycles for classification and reinitialization for a total of 3610 
clock cycles. 
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Hil. COMBINATORIC LOGIC 


The three-way minimum selector performs the equivalent of three 
memory reads, two comparisons, and two memory writes in slightly 
more than 50 ns in Transistor-Transistor Logic (TTL). MOTIS sim- 
ulation indicates that the three-way comparison can be done in about 
75 ns in Complementary Metal-Oxide Semiconductor (CMOS) logic. 
This very fast selection logic, coupled with the shift register architec- 
ture, allows the combinatorics of the DTW to be performed as fast as 
distances can be calculated. A hardware adder combines the minimum 
values selected with the LPC distance from the distance calculator 
and delivers the result to the first stage of the shift register. This 
combinatoric procedure occurs within one 250-ns clock cycle. 

An expanded block diagram of the three-way minimum selector is 
shown in Fig. 4. The three-way selector is implemented as a pair of 
two-way selectors. The first selector determines the minimum of 
inputs A and B, and the second selector chooses between the A or B 
output of the first selector and input C. If input A is selected, the path 
disable bit is set in the auxiliary shift register. This bit is used to 
disable input C of the three-way minimum selector so that a horizontal 
path (in the sense of Fig. 1) can be inhibited. 

The three-way selector is designed so that it will favor input A or B 
in the event that all three inputs are presented with equal values. This 
characteristic is considered to be the best since it assumes no expan- 
sion or compression of the test utterance with respect to the reference 
template and does not constrain the horizontal path unnecessarily. 

The adder is a standard logic circuit. It performs a 16-bit unsigned 
addition in about 35 ns. The output of this adder is made available to 
the shift register at the next processor clock time. 









(0) 


Fig. 4—Minimum selector. 
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IV. LOGARITHM FUNCTION LOGIC 


It is well known that the logarithm of a number can be estimated 
from a geometric series. Since we are applying the logarithm as a 
multiplicative function (and performing a relative comparison of the 
results), the base of the log is unimportant. The logarithm function 
circuitry makes use of a first-order approximation of the log base two 
in the range of one to two as follows: 


log(x) =x-1 for 1<x<2. (11) 


The maximum amount of error between the straight line approxima- 
tion and the base two logarithm function is less than 0.09 in magnitude. 
Relative error approaches about 30 percent near x = 0; however, this 
has little impact on the relative comparison of accumulated distances. 
A plot of logarithm calculation error versus x is shown in Fig. 5a. 

The logarithm for values of x greater than two can be approximated 
by dividing x by the largest value of 2” that results in a quotient that 
is greater than or equal to one. Thus, the logarithm base two may be 
approximated with a piecewise linear function as: 


forx2=1 
log(x) =n + (x/2”) -—1 and (12) 
1 < (x/2”) <2. 


This function is implemented with a priority encoder and multi- 
plexor as shown in Fig. 5b. The priority encoder determines the value 
of n and the multiplexor performs the division by 2”. This logic 
functions accurately only for values of x greater than or equal to 1.0. 
The TTL design requires four dip packages and yields a propagation 
delay of about 25 ns. The implementation of Very Large-Scale Inte- 
gration (VLSI) has been simulated with MOTIS and requires 35 ns 
for the equivalent operation using 2.5-um CMOS. This is a substantial 
improvement over an earlier design that used an Erasable Programm- 
able Read-Only Memory (KPROM) with an access time of about 350 
ns. For VLSI design, this approach is much more desirable since it 
only requires about 200 gates as compared to about 25,000 bits of 
ROM. 


V. CHIP DESCRIPTION 

The DTWP chip will be packaged in a 68-pin chip carrier. The 
allocation of pins is given in Table II. The 68-pin chip carrier is a 
standard of AT&T Technologies, Inc. The six additional pins will be 
used for testing purposes. 

The data pins will be multiplexed so that the DT WP can access the 
reference and test memories while it is calculating a warp path, and 
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Fig. 5—(a) Logarithm calculation error. (b) Logarithm calculator. 


the microprocessor can access internal DTWP registers while the 
DTWP is idle. Access to the internal DT WP registers is controlled by 
three Read/Write Control lines. The Interrupt Enable and Interrupt 
Flag will allow the microprocessor to halt the DT WP temporarily while 
the internal registers are being read. DTWP Busy is provided to aid in 
interfacing the test and reference memories, which must appear to be 
dual-port memories to the DT WP. The Hard Reset allows the micro- 
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Table II—DTWP chip pin allocation 


Reference addressing 

Reference data and DTWP low byte for distance, 
index, number of warps 

Test addressing 

Test data and DT WP high byte 

Read/write control: 


— dieed 
[Nee de) bo~l 


*Load register 
tDTWP go 
*Read distance 
*Read best distance 
*Read index 
tInterrupt enable 
‘Interrupt flag 
tDTWP busy 
*Hard reset 
Clock 
Power 
Ground 
Total pins currently (others used for testing) 


* Address decoded. 
t I/O control. 


CD ND et et ee ee ep pt 


op) 


processor to reset internal registers and clear all pending DTWP 
operations. 

As shown in Table II, several registers are available to the micro- 
processor. The Distance Register will allow the microprocessor to read 
the accumulated warp distance of each warp, if desired, under control 
of the Interrupt function. This is accomplished by enabling the DTWP 
interrupt via Interrupt Enable and waiting for the Interrupt Flag to be 
asserted. At this time the DT WP is idle and the microprocessor can 
read the internal distance register, the best distance found so far, and 
the index of the best reference template. When the read is complete, 
the Interrupt Enable is lowered by the microprocessor, the DTWP 
then clears the Interrupt Flag and proceeds with the next warp. If the 
distance for the next warp is required by the microprocessor, then it 
must enable the interrupt within 900 microseconds in order to catch 
the next interrupt. 

The multiplier consists of three stages. The first latches the multi- 
plicand and multiplier and generates six partial products in parallel 
using a two bit at a time algorithm.° The partial products are added 
together in the last two stages, which are a bit slice adder and a full- 
carry look-ahead adder. Since the output of the multiplier is latched 
and then gated back to the top of the bit slice adder, accumulation of 
products is done along with partial product addition when required. 
This operation is selected or inhibited by the gate. 

As we mentioned previously, the multiplier accumulator performs a 
12- X 12-bit multiply and 24-bit accumulate in less than one clock 
cycle (250 ns). Thus, each clock pulse latches new data into the first 
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Fig. 6—DTWP interfacing. 


stage and simultaneously latches the previous result at the multiplier 
output in a pipeline manner. 

Current estimates indicate that the chip will contain about 6,600 
gates, or 40,000 transistors. Of these, approximately 11,000 transistors 
comprise the multiplier-accumulator. This should result in a chip size 
of about 9 mm by 9 mm when implemented by polycell design tech- 
niques in 2.5-um CMOS technology. 


VI. SYSTEM IMPLEMENTATION 


A block diagram of a typical DT WP configuration is shown in Fig. 
6. Both DTWP test and reference template memories are accessible 
to the microprocessor via the bus by appropriate addressing. The 
number of DT WP’s that can be handled by the microprocessor de- 
pends on the amount of memory address space for reference memory 
since each reference memory requires 192K bytes of address space. 
The 8086 microprocessor, for example, can handle only about four 
DTWPs. If the reference memory is stored in ROM, then this address- 
ing limitation is removed. 

It should be noted that the test and reference memories here appear 
to be dual port. This is accomplished by use of tri-state buffers that 
are controlled by the DTWP Busy signal. When this signal indicates a 
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warp is in progress the memories are floated off the microprocessor 
address/data/control buses and connected to the DT WP address/data 
bus. This allows the microprocessor to have access to the system bus 
so that it may execute another program during DT Ws. 

Each DTWP is controlled by the microprocessor through either 
I/O control signals or memory address lines. The memory address 
must be decoded by outside logic to provide the desired reset, load, or 
read command. The Interrupt Flag and DTWP Busy can be handled by 
the microprocessor through polling or as a true interrupt. 

The DTWP Reset should be triggered on power up and should also 
be able to be triggered by software. This clears all DT WP registers 
and resets internal control logic to standby. In this mode the test and 
reference memories are available to the microprocessor. 

The reference memories are loaded by the microprocessor in the 
training mode with the appropriate number of reference templates (up 
to 512 templates in this case). At this point the recognition mode is 
entered. The two test memories are loaded by the microprocessor with 
an unknown test utterance. Each DTWP must have its own copy of 
the test word. The Load Register line is enabled and a microprocessor 
write cycle is performed to enter into each DTWP the number of 
reference templates to compare. This can be performed simultaneously 
for all DT WPs if the same number is to be written to each DT WP. 

When the DTWP Go is triggered for each DTWP, the DTWP Busy 
flag for each processor will be asserted. Each DT WP will then sequen- 
tially compare the test word against each reference in its reference 
memory until the designated number of references have been checked. 
Optionally, within 900 us after the DTWP Go, the user may assert the 
Interrupt Enable and thus cause the DT WP to enter an idle state after 
completing the first warp. At this point the microprocessor may read 
the distance score for the first comparison. Clearing the Interrupt 
Enable causes the DT WP to resume operation. If the Interrupt Enable 
is asserted again within 900 ys, the DT WP will stop after the next 
warp. In this way the distances for each comparison may be obtained 
by the user. When the specified number of warps is complete, the 
DTWP Busy flag will be cleared and the DT WP enters the idle state. 
The microprocessor can read the Best Distance and Index by sending 
the appropriate Read Enable signal to the DTWP and performing a 
read cycle. Each DTWP must be handled separately by the micro- 
processor during a read cycle so that data from the two DTWPs are 
not mixed together. In this configuration, 512 warps can be performed 
in about 231 ms if intermediate distance readings are not taken. 


VII. SUMMARY 


The dynamic time-warping processor described here is a key element 
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in the single-board isolated word recognizer. The DT WP chip will 
perform dynamic time warps at about 50 times the speed of the 
currently used microprocessor-based hardware. A single DTW requires 
902.5 us (8610 clock cycles at 4 MHz). The number of warps performed 
is controllable by the microprocessor and may currently be set for up 
to 256 warps. The DTWFP also performs a single nearest neighbor rule 
classification. The index of the best reference candidate, warp distance 
to the best reference checked, and warp distance to the current 
reference are available to the control microprocessor after each warp 
by raising an Interrupt Enable Input to the DT WP. This will cause the 
DTWP to interrupt its processing at the end of the current warp and 
wait for the control microprocessor to read data from the DTWP 
registers. If the DTWP is not interrupted between warps, it can 
perform 256 warps in 231 ms (a rate of about 1108 warps per second). 
The architecture and logic design have been tested with a TTL 
board-level implementation. A VLSI multiplier chip is the most com- 
plicated device used. Except for ROM memories, all of the other logic 
is of Medium-Scale Integration (MSI) complexity requiring about 100 
packages. A 96K by 12-bit reference memory and a 1K by 12-bit test 
memory are used for template storage. The processor consumes about 
1.8 INTERPAC* 13-inch wirewrap boards, including reference mem- 
ories. The DTWP chip, which will replace these 100 packages, is 
currently projected to contain about 6600 gates or 40,000 transistors. 
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Automatic speech recognition has advanced to the stage where it is now 
possible to recognize connected strings of words (e.g., digits, letters, city names, 
airline terms) from a word reference set of isolated tokens of each of the words 
in the vocabulary. Recently, an improved training technique called embedded 
word training was proposed, in which reference word patterns were extracted 
from within connected word sequences themselves. In this investigation we 
extend the embedded word training procedure to handle letters of the alphabet 
for use in a directory listing retrieval task. By performing connected letter 
recognition of spoken names based on letter classes (rather than specific 
letters themselves), we show how reliable name recognition results can be 
achieved using a fairly straightforward system on 200 randomly chosen names 
(chosen from an 18,210-name directory) spoken at a normal rate by four 
talkers (three male, one female) in a speaker-trained mode. We have found 
that an 8-percent improvement in name recognition accuracy is obtained when 
using embedded letter training patterns over that obtained from isolated letter 
patterns alone. The overall name recognition accuracy was close to 95 percent. 


I. INTRODUCTION 


Research in the area of isolated word recognition has progressed to 
the state where a wide variety of practical recognition systems exist 
both in the laboratory and in the commercial world.’’ These systems 
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Fig. 1—Illustration of connected word recognition by concatenation of individual 
reference patterns. 


are often capable of handling medium- to large-size (100- to 1000- 
word) vocabularies; they can work in both speaker-trained and 
speaker-independent modes; they can work over dialed-up (local) 
telephone lines; and they can take advantage of task syntax to improve 
overall system accuracy. The major shortcoming of these recognition 
systems is the isolated word format itself, since it is highly unnatural 
for use in a wide variety of tasks (e.g., digit dialing, word spelling, 
etc.). 

The area of connected word recognition has made great strides 
forward in the last few years, and it has reached the point where there 
are several laboratory and commercial systems that attain some lim- 
ited degrees of success.*"' Figure 1 summarizes the basic idea in a 
pattern-based approach to connected word recognition. Assume we 
are given a test pattern, T, which represents an unknown spoken- 
word string, and we are given a set of V reference patterns, {R,, 
R., ---, Ry}, each representing some word of the vocabulary. The 
connected word recognition problem consists of finding the “super” 
reference pattern R*, 


R® = Raa © Raa @ --- @ Raw). (1) 
This is the concatenation of JL _ reference patterns, Raq), 
Rae, +--+, Raz), which best matches the test string, T, in the sense 


that the overall distance between T and R* is minimum over all 
possible choices of L, q(1), q(2), ---, g(Z), where the distance is an 
appropriately chosen distance measure. 

There are several problems associated with solving the above con- 
nected word recognition problem. First, we don’t know L, the number 
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of words in the word string. Hence, our proposed solution must provide 
the best matches for all reasonable values of L, e.g., L = 1, 2, ---, 
Lomax. Second, we don’t know, nor can we reliably find, word bounda- 
ries, even when we have postulated L, the number of words in the 
string. The implication of this observation is that our word recognition 
algorithm must work without direct knowledge of word boundaries; in 
fact, the estimated word boundaries will be shown to be a byproduct 
of the matching procedure. The third problem with a template match- 
ing approach is that the word matches are generally much poorer at 
the boundaries than at frames within the word. In general, this is a 
weakness of word-matching schemes, which can be somewhat alle- 
viated by the matching procedures that can apply lesser weight to the 
match at template boundaries than at frames within the word. A 
fourth problem is that word durations in the string are often grossly 
different (shorter) from the durations of the corresponding reference 
patterns. To alleviate this problem, one can use some time prenor- 
malization procedure” to warp the word durations accordingly, or rely 
on reference patterns extracted from embedded word strings, as will 
be described later in this paper. Finally, the last problem associated 
with matching word strings is that the combinatorics of matching 
strings exhaustively (i.e., by trying all combinations of reference 
patterns in a sequential manner) is prohibitive. 

A number of different ways of solving the connected word recogni- 
tion problem, which avoid the plague of combinatorics mentioned 
above, have been proposed. Among these algorithms are the two-level 
Dynamic Programming (DP) approach of Sakoe,® the level-building 
approach of Myers and Rabiner,’ the parallel single-stage approach of 
Bridle et al.,’° and the nonuniform sampling approach of Gauvain and 
Mariani."’ Although each of these approaches differs greatly in imple- 
mentation, all of them are similar in that the basic procedure for 
finding R° is to solve a time-alignment problem between T and R* 
using Dynamic Time Warping (DTW) methods. 

Figure 2 illustrates the level-building DTW-based approach to con- 
nected word recognition. Shown in this figure are the warping paths 
for all possible length matches to the test pattern, along with the 
implicit word boundary markers (e), e2, --- , ez-1, ex) for the dynamic 
path of the L-word match. The level-building algorithm builds up all 
possible L-word matches one level (word in the string) at a time. For 
each string match found, a segmentation of the test string into appro- 
priate matching regions for each reference word in R* is obtained. In 
addition, for every string length L, the best 6 matches (i.e., the 6 
lowest-distance L-word strings) can be found. The details of the actual 
level-building algorithm are available elsewhere,? and will not be 
discussed here. Instead, we will rely on the properties of the algorithm, 
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7 ee 
R BEST L-WORD _ 
q{t-1) MATCH 


BEST L-1 WORD 
MATCH 7 


REFERENCE FRAME 


~ BEST L-2 WORD 
MATCH 


BEST TWO-WORD _ 
Rq(2) MATCH 





TEST FRAME 


Fig. 2—Sequence of level-building DTW warps to provide best word sequences of 
several different lengths. 


mentioned above, to show how we can use them to obtain improved 
speaker-independent word reference patterns. 

Generally, the single word reference patterns used in the matching 
procedure of Figs. 1 and 2 are chosen as isolated occurrences of each 
vocabulary word (often obtained by some form of robust training 
procedure’*). This form of training is adequate as long as the rate of 
articulation of the spoken connected word strings is not too high (e.g., 
typically fewer than 150 words per minute). However, for high rates 
of articulation, problems occur in the matching due to the gross 
differences between isolated words and those in fluent strings. 

One solution to the problem of high rate of articulation is to use 
reference tokens extracted from connected word strings to supplement 
the isolated word reference tokens.’* Such embedded training tokens 
are extracted from known training strings and can be used in a 
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modified form of the robust training procedure to give robust embedded 
training tokens for each word of the vocabulary. Such techniques have 
been applied to the problem of connected digit recognition in both the 
speaker-trained mode,” and in the speaker-independent mode,” with 
very good success. The connected digits case was a natural first 
application since the number of environments in which each digit 
occurs is strictly limited (i.e., there are at most 10 predecessor digits 
and 10 following digits; furthermore, there is great similarity in many 
of these combinations). 

Another natural application of connected word recognition is the 
case of recognition of connected letters for retrieving a name from a 
fixed directory of names.’*!” Early experimentation with this system 
indicated that the task was a viable one (if the rate of articulation was 
not too high), based on isolated reference patterns for each letter.'* In 
this paper we extend the embedded training procedure to handle the 
case of connected letters, and show how the resulting embedded letter 
templates can be used to improve recognition performance in a name 
retrieval task. The systems we used are speaker-trained ones; however, 
previous studies indicate that our results can readily be extended to 
the speaker-independent case.” 

The organization of this paper is as follows. In Section II we review 
the structure of the overall connected letter recognition system and 
show how it can be used to retrieve the “best” matching name in a 
fixed directory of names. In Section III we discuss an evaluation of 
the overall connected letter, directory listing retrieval system, running 
in a speaker-dependent mode. In Section IV we discuss the results and 
give an analysis of the errors. Finally, in Section V we summarize our 
results and our main conclusions. 


Il. THE CONNECTED LETTER RECOGNITION SYSTEM 


Figure 3 is a block diagram of the connected letter recognition 
system, as it used in the directory listing retrieval application. The 
system operates as follows. A user spells the last name of the person 
for whom directory information is desired as a connected sequence of 
spoken letters, followed by a brief pause, followed by the initials (again 
as a connected sequence). A conventional endpoint detector’? finds 
the beginning and ending of each of the two spoken strings. An 
example of such endpoint detection is given in Fig. 4, where the dashed 
lines indicate the speech endpoints. An 8-pole Linear Predictive Coef- 
ficient (LPC) analysis is performed on each frame of both the spoken 
last name and the initials, where the analysis frame size is 45 ms and 
consecutive frames are spaced 15 ms apart. For both the spoken last 
name and the initials, a level-building Dynamic Time Warping (DTW) 
fit to a set of letter classes is made (based on letter reference patterns) 
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Fig. 3—Automatic directory listing retrieval system based on connected letter spelling 


of names. 
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Fig. 4—Intensity contour for spelled name. The first set of vertical dashed lines 
delimits the beginning and end of the spoken last name; the second set of vertical dashed 
lines delimits the initials. The horizontal dashed lines indicate energy thresholds from 
which the beginning and ending frames are found. 
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and both the individual letter scores and the classes scores are saved. 
Last name class scores for all possible last name classes are generated 
and sorted by distance. A name generator sequentially goes through 
the sorted class list and generates all valid names within the class (i.e., 
those stored in the phone book). A name score generator uses the 
letter scores to give a total name score for each name within each 
class. Name scores are sorted in a list according to total name distance. 
Classes are searched until the best possible name score exceeds a 
specified threshold (related to the best name distance achieved so far). 
A list of the best name scores is then returned and the name recognized 
is the one at the top of the list. 

In the remainder of this section we briefly describe the individual 
steps used to recognize the name. First we describe some characteris- 
tics of the letter classes and the phone book and then discuss the way 
in which we extract the embedded letter reference patterns for each 
talker. Next we review the level-building DTW algorithm and follow 
with a discussion of the name generation, name scoring, and decision 
rules. 


2.1 Classification of letters into letter classes 


The concept of blocking letters into letter classes, for purposes of 
speech recognition, was introduced by Aldefeld et al.”° for the con- 
nected letter recognition application. The basic idea is that highly 
accurate recognition of spelled letters (over dialed-up telephone lines) 
cannot be achieved. Hence it is preferable to combine highly confusable 
letters into letter classes, perform recognition on letter classes, and 
decode the letter classes into actual directory names by searching a 
directory sorted by letter class combinations. Name scores are gener- 
ated on the basis of individual letter scores (which are also generated 
in the recognition phase). 

In particular, the 26 letters of the alphabet were assigned to 3 letter 
classes as shown in Table I. (A fourth class, class 0, contains the space 


Table |—Assignment of 
letters into letter classes 
Letter Class 

0 1 





N<HVOmMOOQw 
KeORUH >| to 
KCNDO'ZSOA ow 
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character, $). Class 1 contains the /EE/ letters, whereas classes 2 and 
3 are a partitioning of the remaining 17 letters into two disjoint sets 
with minimal interclass confusion.”* We denote the total number of 
classes as C. 

For each name in the directory a set of I indices for the last name 
are defined. These indices define the letter class of each letter of the 
last name. Hence the names Rabiner and Wilpon would be represented 
by the indices: 


NAME: RABINER WILPON 
CLASS: 32123138, 223123. 


If we restrict ourselves to using J = 6 indices for the last name, and 
we adopt the convention that we use the character $ to pad out last 
names of fewer than six letters, then we have a total of 


36 = 729 Classes with 6 letters 


3° = 248 Classes with 5 letters 
34= 81 Classes with 4 letters 
35 = 27 Classes with 3 letters 
32= 9 Classes with 2 letters 
31= 3 Classes with 1 letter 


1092 Total Classes. 


After sorting an AT&T Bell Laboratories directory of 18,210 names 
according to the letter class assignment, a total of 1053 of the 1092 
classes actually had one or more names assigned to it. Hence, coding 
of the last name to six indices is an efficient representation in terms 
of usage of possible letter classes. 


2.2 Extraction of embedded letter patterns 


The set of letter reference patterns for each talker consisted of three 
robust tokens of each letter, obtained as follows. An isolated robust 
token was obtained in the conventional manner, i.e., the talker spoke 
the letter repeatedly until two tokens were sufficiently similar (at a 
small enough distance) that they could be averaged.'* Two embedded 
robust tokens of each letter were obtained by having the talker speak 
specified three-letter strings, extracting the middle letter via DTW 
alignment, and then using the standard robust training procedure on 
the embedded letters.’* One of the embedded robust tokens was ex- 
tracted from three-letter strings with minimal coarticulation between 
letters at the boundary. The other embedded robust token was ex- 
tracted from three-letter strings with strong coarticulation between 
letters at the boundary. 
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Table Il—Training sequences used for 
extraction of robust embedded letters for A, 

















E, S, and W 
Letter 
A E S W 
Noncoarticulated Sequences 
FAC FEK FSR SWP 
HAP SEQ XSW XWT 
XAT XEK HSL FWK 
SAQ HEK XSN HWQ 
FAP FEQ FSR FWC 
HAQ XEQ HSW SWK 
XAC HEK XSR HWQ 
Coarticulated Sequences 
RAL WEL WSC WWR 
MAN NES NSK LWY 
PAY MEL LSP MWW 
RAW YEN RSQ LWR 
ZAL WER LST MWN 
JAR MEN MSP NWF 
DAN DEY YSK TWN 


By way of example, Table II shows the three-letter training se- 
quences used to obtain the robust tokens for the letters A, E, S, and 
W. The first seven sequences for each letter represent noncoarticulated 
strings; the next seven sequences for each letter represent coarticulated 
strings. The talker was only required to speak as many strings as 
required for obtaining a robust token of the letter. In theory as few as 
two strings could be adequate for this purpose; in practice it usually 
took four or five strings to give a pair of consistent embedded tokens. 
This is due to the high degree of variability of spoken letters in spelled 
strings. 

An important point should be made about the training sequences. 
In theory the three-letter sequences were obtained by randomly se- 
lecting an initial and a final letter from a set of candidates based on 
the manner of production of the middle letter (the one being extracted) 
at the beginning and end of the word. In fact we found that the 
conventional vowels, namely A, E, I, O, and U, could not be used in 
either initial or final position in the training strings. Such combina- 
tions almost always led to extremely poor alignment paths for deter- 
mining the embedded letter boundaries. As such, these letters were 
eliminated from consideration for use in the embedded training pro- 
cedure. 

The results of the training (which typically required about 30 
minutes per talker) were a set of three reference patterns for each 
letter or a total of 78 reference patterns for the 26 letters. 
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Table III—Example of use of LB DTW 
algorithm in recognizing a spoken 
name 


(a) Level-Building Distance Scores for Last Name 
and Initial Classes 


Name: ZBOYAN AM 
Class: 112223 23 


Last Name Class Scores: 
l= 1, 2,3,4,5 No matches found 











l=6 Class Distance 
112213 0.214 
112223 0.224 
212213 0.246 
Initial Class Scores: 
l=1 Class Distance 
2 0.449 
l=2 
23 0.232 
13 0.267 
(b) Overall Name Distance Scores 
Overall 
Class Name Distance 
112223 ZBOYAN AM 0.226 
112223 ZBOYAN DL 0.235 


2.3 Level-building recognition procedure 


The recognition procedure is based on using the Level-Building 
(LB) DTW algorithm on strings of letter classes by using all 26 letters 
at each level but considering them only as different class templates. 
That is, different letters in the same letter class are considered as 
different templates for their common letter class. In the LB imple- 
mentation we keep track of the C best (class) candidates at each level 
and use the standard LB traceback algorithm’ to generate a name 
class score for each of the 1053 possible last name classes. 

By way of example, Table III illustrates the application of the LB 
DTW algorithm on the spoken name ZBOYAN AM. We use the C = 
3 letter classes shown in Table I. The first step is to generate last 
name class scores for each of the 1053 last name classes and to order 
these scores by distance. The results are shown in Table IIIa for the 
top three classes. Since a six-letter last name was spelled, the top 
three last name class distance scores are for three-letter last names. 
The last name class corresponding to the spoken name is not the best 
class, but instead has the second best distance score. 

The next step is to generate initial scores for all possible sets of one 
or two initials. For both the last name and the initials, the LB 
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algorithm keeps track of the best individual letter scores at each level. 
This requires a reasonable amount of storage but leads to a very 
efficient procedure for generating name scores. To generate a name 
score, one merely backtracks the individual letter scores (for both last 
name and initials) from the appropriate memory stacks, and a total 
name score is generated as 


Din: Lin + Dr Li 


pe ae (1) 


Dyame = 


where D,n and D; are the normalized distances for the last name and 
initials, and Lyx and L; are the number of letters in the last name and 
initials. 


2.4 Stopping criteria 


As we discussed above, the process of recognizing a name from 
spoken spelled letters consists of: 

1. Running the LB on the last name and initials. 

2. Generating all possible last name class scores. 

3. Sorting the list of last name class scores. 

4, Examining the sorted last name class score list sequentially and 
generating names and name scores for all names in each class that is 
examined. Name scores are sorted directly into a name score list. 

5. Continuing this procedure until a stopping criterion is satisfied. 
The stopping criterion is that the best possible name score for a given 
class exceeds the best actual name score (based on previously checked 
names) by a given threshold. The best possible name score for a given 
class is given by 


— Dine: Line + D,-Ly 


De = 
Cc Line re Li ’ (2) 


where Drwc is the last name class score for the class being examined, 
Linc is the number of letters in the last name class score, D; is the 
best possible initials score (as determined by the LB output on the 
initials), and L; is the number of initials corresponding to the best 
initials score. The threshold used (in our simulation) for the stopping 
criterion was the value 0.06. 

6. Once the stopping criterion was satisfied, the system returned 
the sorted list of names scores, and the recognized name was chosen 
as the one with the smallest distance. 

Table IIIb shows the results of running steps 4 and 5 on the sequence 
of last name and initial scores used to generate the data of Table IIIa. 
Only two names had distance scores within the threshold, the best 
score corresponded to the actual spoken name in this case. 
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Ill. EXPERIMENTAL EVALUATION 


To evaluate the performance of the directory listing retrieval system 
described in Section II, four talkers (three male, one female) each 
trained the recognizer using the robust training procedure to give the: 
isolated and embedded templates for each letter. The four talkers were 
all experienced users of speech recognition systems. These same talk- 
ers each provided a test set of 50 randomly chosen names from the 
18,210-name directory (the set of 50 names was different for each 
talker). Each name in the test set was spoken as a sequence of 
connected letters for the last name, followed by a pause, followed by a 
sequence of connected letters for the initials. The talkers spoke each 
name at a normal rate. All recordings were made over local dialed-up 
telephone lines. The average talking rates for the four talkers are 
shown in Table IV. The average rates for the last name vary from 189 
words per minute (wpm) to 218 wpm; for the initials the rates vary 
from 140 to 167 wpm. Thus, the names were spoken at very fast rates 
of articulation. 


IV. RECOGNITION RESULTS—SPEAKER-TRAINED CASE 


The directory listing retrieval system of Section II was run on the 
200 names by the four talkers in a speaker-dependent mode. The LB 
parameters (see Ref. 9 for a complete description of these parameters) 
were set to: 

1. ¢ = width of DTW search region = 99 

2. Mr= multiplier for interlevel scores = 2.2 

3. denp = search region at end of string = 4 

4. dg, = number of frames that can be skipped at beginning of 
reference template = variable 

5. dro = number of frames that can be skipped at end of reference 
template = variable 

6. Inserted silence at beginning or end of reference template = 
variable. 

The values for 5g; and dgo were made variable with the templates— 
i.e., different values were used for the isolated pattern than for each 


Table IV—Talking rates (wpm) 
for the four talkers 


Last Name Initials 
Talker Rate(wpm) Rate (wpm) 
1 189 140 
2 210 142 
3 210 159 
4 218 167 


470 TECHNICAL JOURNAL, MARCH 1984 


of the embedded patterns. In particular we considered three sets of 6; 
and 62 values, namely: 

1. 6r1 = (0, 0, 0), Ope = (4, 0, 0) 

2. 6R1 = (4, 0, 0), Ore = (6, 0, 0) 

3. bri = (4, 2, 0), Ore = (6, 3, 0), 
where the first value is for the isolated pattern, the second value is for 
the noncoarticulated pattern, and the third value is for the coarticu- 
lated pattern. The first set of values was the optimal one for speaker- 
independent patterns for digits;’° the second set of values was optimal 
for speaker-dependent patterns for digits;'* the third set of values is a 
compromise that provides some template shortening for the noncoar- 
ticulated reference patterns. 

The inserted silence parameter is the number of frames of silence 
put at either the beginning or end of templates to reflect the presence 
of initial or final stops in the word. The letters of the alphabet for 
which initial silence was used were b, d, g; and p, t, k, and q. None of 
the letters used final silence insertion. Based on some preliminary 
experimentation, three sets of silence values were used, namely: 

1. 0 for {b, d, g}, 0 for {p, t, k, q} 

2. 2 for {b, d, g}, 3 for {p, t, k, q} 

3. 4 for {b, d, g}, 6 for {p, t, k, q}, 
where the parameter values are in terms of frames; hence, four frames 
corresponds to 60 ms of silence insertion. 

A series of recognition tests were performed in which 6p; and dpe 
were varied along with the silence insertion variable. For each of these 
tests, name recognition accuracy was measured for three sets of 
speaker-dependent reference templates,* namely: 

1. IS = Isolated templates alone 

2. IS ® NC = Isolated plus noncoarticulated embedded templates 

3. IS ® NC & CO = Isolated plus both types of embedded templates. 
Results for each of the nine sets of variables and for the three template 
sets are given in Table V. The results given in this table show the 
following: 

1. For the IS template set, values of (0, 0, 0) and (4, 0, 0) for 6,1 
and dre lead to extremely poor performance. For all other choices of 
parameter values, the system performance is essentially identical at 
86 percent + 1 percent. These results point out the necessity of being 
able to skip some reference frames at the beginning of each template. 

2. For the IS ® NC template set, there is only a small variation in 
performance (from a low of 91 percent to a high of 94 percent) as the 


* Runs were also made with other combinations of reference patterns (e.g., NC 
templates alone), but the three sets discussed below provided the most interesting and 
informative results. 
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Table V—Recognition accuracy as a function of 
silence parameters 


bri and dr2 Values 


(0,0,0) (4,0,0) (4, 2, 0) 
Template Set (4,0,0) (6,0,0) (6, 3,0) 


(a) Values of 0 for {b, d, g} and 0 for {p, t, k, q} 




















IS 53.5 86 86 

IS ® NC 92 93 94 
IS®NS ®CO 93 94 91.5 
(b) Values of 2 for {b, d, g}, and 3 for {p, t, k, q} 

1s 50.5 87 87 

IS ® NC 92.5 92.5 91 

IS @ NC ®CO 94 93.5 95 
(c) Values of 4 for {b, d, g} and 6 for {p, t, k, q} 

IS 45.5 85 85 

IS®NC 91 93 92 

IS@NC ®CO 91.5 94 94 


dri, Or2, and silence parameters are varied. When we compare the 
performance to that obtained from the IS template set, we can see 
that the inclusion of the embedded NC templates leads to improved 
performance, as well as to insensitivity to the exact parameter values 
of the variable system parameters. 

3. For the IS 6 NC © CO template set, only very slight improve- 
ments in performance are obtained over that of the IS ® NC template 
set, and these improvements do not occur for all values of the variables. 
The highest accuracy obtained is 95 percent; however, four different 
sets of parameter values yield 94-percent name accuracy. 

A further analysis of the results for the best parameter set is given 
in Table VI. Included in this table are the individual recognition 
accuracies as a function of candidate position (top 6 candidates, 6 = 
1, 2, 3, 4, 5) for each talker (along with the average), as well as two 
measures of the amount of searching performed to find the best name. 
The search measures are C,, the average number of last name classes 
searched, and N,, the average number of names whose distance score 
was evaluated. The results in Table VI show that two of the talkers 
performed well using IS templates, but the other two talkers performed 
very poorly. For these other two talkers the inclusion of embedded 
templates led to improvements in performance. It can also be seen 
that a 2- to 3-percent improvement in accuracy can be obtained by 
considering the second candidate position scores—i.e., about 2 to 3 
percent of the time the correct name is in second position. Such cases 
are typically names with slight (within letter class) errors in the 
initials. 

By examining the search statistics we see that the average search 
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Table VI—Percentage of individual recognition accuracies as a 
function of candidate position and search statistics for each talker for 
three template sets using 6g; = (4, 2, 0) and dg2 = (6, 3, 0), and 
silence for {b, d, g} = 2 and {p, t, k, q} = 3 


Candidate Position 


























Talker 1 2 3 4 5 C, N, 
(a) Results Using IS Template Set 
1 76 82 84 84 86 114 1976 
2 80 82 82 86 86 139 2553 
3 94 96 96 96 96 36 671 
4 98 98 98 100 100 37 665 
Average 87 89.5 90 91.5 92 81.5 1466 
(b) Results Using IS 6 NC Template Set 
1 82 88 88 88 88 111 2086 
2 94 96 98 98 98 113 2308 
3 94 98 98 98 98 10 203 
4 94 94 94 96 96 40 675 
Average 91 94 94.5 95 95 68.5 1318 
(c) Results Using IS 6 NC ® CO Template Set 
1 94 98 98 98 98 56 1011 
2 92 94 94 94 94 70 1416 
3 96 98 98 98 98 19 352 
4 98 98 98 100 100 16 296 





Average 95 97 97 97.5 97.5 40.3 769 


time for the three-template-per-word set is about one-half that of the 
IS template set. Hence, the embedded templates yield considerably 
more accurate name classes than that obtained from the IS templates 
alone. 

An analysis of the 10 name errors (in first candidate position) for 
the IS ® NC @ CO template set of Table VIc shows the following: 

1. Four of the errors were due to errors in initials of people with 
the same last name. In all cases these errors were within letter class 
errors, i.e., JR confused with AR. 

2. Four of the errors were due to a known flaw in the level-building 
algorithm” in which the second best path to a given frame need not 
be the optimal second best path. Such cases could be potentially 
corrected at the expense of greatly increased computation in the LB 
algorithm. 

3. Two of the errors were names that could not be matched by the 
individual letter patterns. Such cases were highly coarticulated letter 
sequences whose matches were extremely poor in the LB algorithm. 


4.1 Recognition results—speaker-independent case 


The same set of 200 names was used as a test of the connected letter 
directory listing retrieval system in a speaker-independent mode. For 
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Table Vil—Percentage of individual recognition accuracies as a 
function of candidate position and search statistics for each talker for 
the speaker-independent case 


Candidate Position 





Talker 1 2 3 4 5 Cc: N, 
1 88 90 94 94 96 29 473 
2 70 80 82 88 88 185 3137 
3 84 90 90 92 92 110 2097 
4 66 74 74 76 78 174 3011 
Average 717 83.5 84 87 88 124.5 2179 


this test the letter reference patterns were a set of 12 isolated templates 
per letter, the templates having been extracted from a clustering 
analysis of isolated occurrences of each letter by 100 talkers (50 male, 
50 female). 

The results of the recognition test are given in Table VII, which 
gives recognition accuracy as a function of candidate position for each 
of the four talkers (as well as the average), and the average search 
statistics. Overall, we can see that degraded performance results from 
the use of only isolated templates. An average name recognition 
accuracy of 77 percent is achieved, as opposed to the 95-percent 
accuracy in the speaker-dependent case. The inherent system difficul- 
ties are illustrated in the average search statistics, which show that it 
took about 124 class evaluations and 2179 name evaluations to find 
the best name—a factor of three to one greater than required for the 
speaker-trained case. 

An analysis of the 46 name errors (out of the 200 names) shows: 

1. Nine errors where only the initials were incorrect. 

2. Twenty-two errors in which the last name was in error—i.e., a 
match to an incorrect last name was better than the match to the 
correct last name. In such cases the match to the initials was not able 
to correct the errors. 

3. Fifteen errors in which the correct name could not be matched 
because of the LB flaw discussed earlier, or because the rate of 
articulation of the letters exceeded the rate at which a match could be 
achieved from isolated letter templates. 


V. DISCUSSION 


To get some perspective on the relevance of the results given in 
Section III, we must compare the current system performance against 
that achieved in earlier implementations subject to the experimental 
constraints of small sample populations (i.e., the use of only four test 
talkers). In the most relevant comparison, Myers and Rabiner’ stud- 
ied a similar system in which a fixed set of 50 names was used as the 
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test by each of four talkers (different from those used here). Using 
isolated templates alone, recognition accuracies of 90.5 percent and 
87.5 percent were achieved in the speaker-dependent and speaker- 
independent modes, respectively, for normally spoken names. Earlier 
work by Aldefeld et al. has shown that the 50 chosen names tended to 
give an overbound on true system performance as there was no 
adequate representation of the common problems in name spelling 
(e.g., multiple names with differing initials, names differing in a single 
letter, etc.).2? Hence the recognition accuracies of 87 percent and 77 
percent for the case of isolated templates in the speaker-dependent 
and speaker-independent modes are comparable to those reported 
earlier. 

The recognition performance using embedded training patterns in 
the speaker-dependent case indicates an improvement in recognition 
accuracy along with a significant reduction in search time to find the 
best name. Hence we conclude that, as in the connected digits recog- 
nition task, the use of embedded word training can and does enhance 
the recognition system performance. 

At this point it is natural to ask “Where do we go from here to 
improve the performance of a connected word recognition task based 
on pattern matching techniques?” There are two obvious directions 
for making improvements. 

First, the embedded training procedure should be able to provide 
more than a single embedded pattern when necessary. For example, it 
was Clear that some letters are more influenced by context than others. 
In such cases the resulting robust embedded pattern was often a poor 
representative of the letter, and did not yield good matching scores in 
real names. Using two or more embedded tokens obtained via some 
clustering procedure would have aided greatly in such cases. 

A second obvious direction for improvement is to make no decision 
on names in which two or more name candidates are within some 
reasonable distance of the best name score. In such cases (mostly 
involving initial errors) additional information should be requested 
from the talker to help resolve ambiguity. Such a strategy was used 
with great success by Aldefeld et al.”! in a practical implementation of 
this system with isolated letter input. 


VI. SUMMARY 


We have shown how a practical directory listing retrieval system 
could be implemented on the basis of connected letter name spelling. 
Our results indicate that improved recognition performance can be 
obtained when combining embedded letter patterns (suitably extracted 
from three-letter strings) with the standard isolated letter patterns to 
form an enhanced letter reference set. Using this combined set of 
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references, improvements in name accuracy of 8 percent and reduc- 
tions in search time of a factor of two result, when the system is tested 
in a speaker-trained mode using dialed-up telephone line recordings. 
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Accurate location of the endpoints of spoken words and phrases is important 
for reliable and robust speech recognition. The endpoint detection problem is 
fairly straightforward for high-level speech signals in low-level stationary 
noise environments (e.g., signal-to-noise ratios greater than 30-dB rms). 
However, this problem becomes considerably more difficult when either the 
speech signals are too low in level (relative to the background noise), or when 
the background noise becomes highly nonstationary. Such conditions are often 
encountered in the switched telephone network when the limitation on using 
local dialed-up lines is removed. In such cases the background noise is often 
highly variable in both level and spectral content because of transmission line 
characteristics, transients and tones from the line and/or from signal genera- 
tors, etc. Conventional speech endpoint detectors have been shown to perform 
very poorly (on the order of 50-percent word detection) under these conditions. 
In this paper we present an improved word-detection algorithm, which can 
incorporate both vocabulary (syntactic) and task (semantic) information, 
leading to word-detection accuracies close to 100 percent for isolated digit 
detection over a wide range of telephone transmission conditions. 


I. INTRODUCTION 


In an automatic speech recognition system, it is assumed that during 
a recording interval (which may be continuous) a user will speak a 
command, which he wants the recognizer to interpret and respond to 


* AT&T Bell Laboratories. 


Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- 
out payment of royalty provided that each reproduction is done without alteration and 
that the Journal reference and copyright notice are included on the first page. The title 
and abstract, but no other portions, of this paper may be copied or distributed royalty 
free by computer-based and other information-service systems without further permis- 
sion. Permission to reproduce or republish any other portion of this paper must be 
obtained from the Editor. 


479 


accordingly. The first task of a recognition system is to separate the 
input speech from the various types of nonspeech events that also 
occur during the recording. This task is referred to as endpoint 
detection. 

Accurate detection of the spoken word has been shown to be crucial 
for reliable word recognition.’ Most research in the study of designing 
endpoint detectors has used speech databases, where the speech has 
been collected over clean transmission mediums [using close-talking, 
noise-canceling microphones, or telephone speech over local Private 
Branch Exchanges (PBXs)]. The signal-to-noise ratio (s/n) under 
these conditions is high (between 35 and 50-dB peak s/n). Also, the 
noise generated in such a system is usually stationary. This research 
has led to quite reliable endpoint detectors. 

Endpoint detection becomes much more difficult when the trans- 
mission system is corrupted by the many noises one finds on a 
standard, dialed-up telephone line. Some of these problems include 
popping sounds, crackling noises, carrier frequency tones, background 
speech, and other nonstationary noises. The need for an accurate 
speech endpoint detector that works as well in these environments as 
in clean environments is a goal that has not been met. In an earlier 
study, when telephone customers, speaking over randomly dialed 
telephone lines with various types of transmission distortion, were 
asked to speak their telephone number as a sequence of isolated digits, 
existing endpoint algorithms often failed. 

To evaluate a new endpoint scheme, we must define the require- 
ments on endpoint accuracy. An indirect measure of these require- 
ments can be obtained directly from the recognizer as follows. Given 
a test set of many spoken words, use them as input to a word 
recognition system consisting of an endpoint detector and a recognizer. 
If, when substituting a new endpoint detection algorithm for an earlier 
one, we obtain higher word recognition accuracy, then we will say that 
the new endpoint detection algorithm is better than the earlier one. 

One way of explicitly defining the requirements on endpoint accu- 
racy is to perform the following experiment. Take a speech database 
of isolated words and manually detect the beginning and end of each 
word. Next, vary the beginnings and ends of each word over some 
specified range (e.g., + 150 ms) and perform isolated word recognition. 
By examining sensitivity of the recognition scores to variability in 
endpoints, an explicit relationship can be found. Such an experiment 
was performed and will be described in this paper. 

The purpose of this paper is to describe a new approach for deter- 
mining the endpoints of spoken words, which incorporates both vo- 
cabulary (syntactic) and task (semantic) information, leading to word- 
detection accuracies close to 100 percent for isolated digit detection 
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over a wide range of telephone conditions. We call the new approach 
a top-down design. Simply put, we look for strong (vowel-like) peaks 
in the energy contour of a speech utterance and process the speech 
around the peaks to find potential beginning and ending points. 
Several rules involving duration, and onset and decay times are then 
used to refine the endpoint estimates. 

This new algorithm is compared to an earlier endpoint algorithm by 
Lamel et al.,” which tries to find word endpoints based on the energy 
of the speech rising some fixed level above the background noise 
energy. We call this type of approach a bottom-up approach. In 
addition we will briefly identify several other algorithms that were 
investigated, none of which performed as well as the top-down ap- 
proach. 

The format of this paper will be as follows. In Section II we review 
the bottom-up approach to endpoint detection. We describe our new 
top-down word detector in Section III. In Section IV we describe the 
database used in all our tests, present results on the tests to explicitly 
measure requirements for word-detection accuracy, and give recogni- 
tion results comparing the bottom-up approach to the new top-down 
method. 


Il. REVIEW OF BOTTOM-UP ENDPOINT DETECTOR 


Figure 1 gives a block diagram of the bottom-up endpoint algorithm 
of Lamel et al.” First the input speech is bandpass filtered and sampled. 
Since we are working with a telephone bandwidth signal, we bandpass 
the speech signal from 100 to 3200 Hz and sample it at a 6.67-kHz 
rate. The digitized speech is then preemphasized using a simple first- 
order digital filter with a z transform: 


H(z) = 1 - az™, (1) 


where a = 0.95. The digitized speech is then blocked into frames of N 
samples, with a shift between frames of ZL samples. Experimentation 
has found that N should be set to 300 samples and L should be set to 
100 samples. This corresponds in time to 45-ms frames with a 15-ms 
shift between frames. Each frame of speech is then weighted by a 
Hamming window of the form 


2 
w(n) = 0.54 — 0.46 cos 22"), 0<n<N-1 (2) 


(where N is previously defined). Windowing reduces the truncation 
effects of the framing procedure. We denote the 7th frame of windowed 
speech as s(n) defined for 0 <n < N-1. 

After this initial digital processing, the energy of the signal is 
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Fig. 1—Block diagram of bottom-up endpoint algorithm of Lamel et al. 


computed. This computation can be made simply by summing the 
squares of the signal values during a frame of speech. However, since 
we are using a Linear Prediction Coding (LPC) recognizer,*? which 
requires that a pth-order autocorrelation analysis (in our case p = 8) 
be performed on the entire recording interval, the energy is extracted 
as a by-product of the analysis. That is, 


E(¢) = 10 logiokt,(0), f= 1,2, oe Ay NP, (3) 


where NF is the total number of frames in the recording interval, 
R,(0) is the zeroth-order correlation coefficient, 


N-1 
R,(0) = 2X [s-(n)]’, (4) 


and E(Z) is on a decibel scale. 

The next step in the processing (called adaptive-level equalization 
in Fig. 1) is a normalization of the energy contour to compensate for 
the mean background noise level. First, Eynin is computed as 

Ein 7 min (K(Z)). (5) 


1<7<NF 
E(Z) is then formed as the difference between E(/) and Emin, 
E(¢@) = E(@) — En, %4=1,2,---, NF. (6) 


Next, the background level estimate is refined even further by com- 
puting a histogram of the signal energies. The histogram is restricted 
to the lowest NP dB (typically NP = 15) of E. We then apply a three- 
point median smoother to this histogram. Finally, we create the 
modified energy contour E(/), 7 =1,2,---, NF, 


E(Z) = E(Z) — Mode, (7) 


where Mode is the mode of the smooth histogram generated above. 
The remaining blocks of the bottom-up endpoint detector are the 

energy pulse detector and an endpoint ordering procedure. The energy 

pulse detector scans the modified energy contour and selects all 


482 TECHNICAL JOURNAL, MARCH 1984 


potential energy pulses within the recording interval. Pulse-combining 
rules are used to eliminate short pulses, and combine close pulses. 
Several parameters, along with their current settings, need to be 
defined in order to explain how these blocks operate. These parameters 
include: ; 

1. K1, K2, and K8 are energy thresholds used in determining the 
word boundaries (3, 10, 5 dB). 

2. IT1 and IT2 are frame counter thresholds for determining the 
presence or absence of any breath noises at the boundary points of a 
detected utterance (5,5 frames). 

3. IT3 is the minimum length for a detected pulse (5 frames). 

4, NFMIN is the minimum length in frames for an utterance (10 
frames). 

Figure 2 shows a state representation of the operation of the energy 
pulse detector. The normalized energy (Z) of the recording is scanned 
from left to right (¢ = 1 to 7 = NF). If E(Z) rises first above Ki, then 
above Ky (without falling below K,), a beginning pulse marker is 
assigned to frame 7. Similarly, when the energy dips below K3 an 
ending marker is assigned. The beginning IT1 frames and ending IT2 
frames are checked for breath-type noises (i.e., low energy content 
throughout the IT1 or IT2 frames), and eliminated if necessary. All 
pulses must have a minimum length (IT3). Pulses are then combined 
based on their proximity to other pulses. All final pulses are checked 
for duration and maximum energy content, and pulses that do not 
pass are eliminated. The final output of the endpoint detector is a set 
of ordered pairs of beginning and ending points of segments within 
the recording interval. It is assumed that each segment corresponds 
to a spoken word within the recording interval. The Lamel et al. 
bottom-up endpoint detector can be, and has been, implemented as a 
real-time endpoint detector. 
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F Fig. 2—State representation of energy pulse detector from Lamel et al. endpoint 
etector. 
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Ill. DESCRIPTION OF THE TOP-DOWN ENDPOINT DETECTOR 


As discussed previously, the bottom-up endpoint detector works 
very well in stationary noise backgrounds with reasonably high signal- 
to-noise ratios. However, in highly variable noise background condi- 
tions it tends to fail at a very high rate. Hence, we now describe a top- 
down approach capable of finding words in highly nonstationary 
backgrounds. 

The design of the top-down endpoint detector is similar to that of 
the bottom-up approach in that it computes a normalized energy array, 
finds pulses in the recording interval, and then combines them to get 
the final endpoint decisions. The differences lie in the energy pulse 
detection and endpoint processing procedures. 

To understand the differences, we need to define some additional 
parameters along with their current settings, namely: 

1. MXWD is the number of utterances within a recording interval 
(7 words). 

2. IGAP is the number of frames from which a pulse slope is 
computed (3 frames). 

3. ISLOPE is the pulse slope threshold (7 dB). 

4. NSEP, NSEP2 are pulse separation counters (2,7 frames). 

Figure 3 gives a flow diagram of the energy pulse detection proce- 
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Fig. 3—Block diagram describing energy pulse detection procedure from top-down 
endpoint algorithm. 
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dure. The philosophy is to find the high energy frames in a local region 
and then try to define the energy pulse boundaries using the lower 
energy frames. In particular, the algorithm scans the entire recording 
interval (i.e., E(Z), 7 = 1,2, - - - , NF) until it finds the frame with 
the highest energy. The algorithm then analyzes the energy values of 
the surrounding frames. It looks at frames prior to the maximum 
energy frame until it finds a frame with energy less than the threshold 
K1, and it looks at frames beyond the maximum energy frame until it 
finds a second frame with energy less than the threshold K3. At this 
point the pulse detector has found a set of possible beginning and 
ending frames for an utterance—i.e., an energy pulse. Its next task is 
to try to eliminate any breath noises at the estimated boundaries of 
the energy pulse. This is performed by testing the first IT1 frames 
and last IT2 frames of the energy pulse for consistently low energy 
content. Next, the detected pulse (corresponding to the utterance) is 
checked to guarantee that its duration is greater than a minimum- 
length threshold and that its amplitude is above a minimum level. 
Pulses are eliminated if they do not pass these tests. This procedure 
is iterated throughout the recording interval. All previously detected 
pulses are eliminated from consideration in each new iteration. When 
this process is complete, a set of NPULSE pulses are found within the 
recording interval. Figure 4a shows a typical energy plot of a string of 
isolated digits indicating where pulses were detected. In this example, 
six energy pulses were detected; however, there are only four spoken 
digits in the recording interval. 

The energy pulses are next sent to a pulse combiner algorithm, 
which attempts to combine two or more adjacent pulses to form longer 
pulses. This process works as follows. First, all pulses are sorted in 
order of decreasing peak energy. We then start with the pulse with 
the highest peak energy and try to add pulses to it based on the 
following rules. 

For a prior pulse to be added to the beginning of the current energy 
pulse, first the Downward Slope (DS) (defined over the last IGAP 
frames of the pulse) of the pulse must be above a threshold. Such a 
sharp downward slope tends to occur during stop-gap-type pulses 
within a word. Second, the prior pulse must lie within NFW frames 
of the current pulse (where NFW is determined by DS). If these 
conditions occur, the prior pulse will be combined with the current 
pulse to give a single combined pulse. In a similar manner, a pulse can 
be added to the end of the current energy pulse. In addition to the 
slope constraint, there are other restrictions for combining pulses. The 
duration of the combined pulse must be below a maximum-length 
threshold. (Clearly, if the combining pulse duration is too long, it 
signifies that two distinct words were spoken close to each other, and 
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Fig. 4—Log energy contour from spoken string of isolated digits. Dashed lines indicate 
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ound in (a). 


hence, should not be combined.) This restriction is not applied when 
the algorithm is detecting connected words as well as isolated word 
sequences. A second restriction is that the upward slope value (defined 
similarly to the downward slope value) between two combining pulses 
must also be above a threshold. This situation typically signifies a 
stop gap within a word. Figure 4b shows the result of applying the 
pulse combiner rules to the spoken sequence of Fig. 4a. Pulses 3 and 
4 have been combined (this is the digit six), as have pulses 5 and 6 
(this is the digit eight). 


3.1 Syntactic constraints of digits 


In the final decision block, the first task is to eliminate all endpoint 
utterances that are too short (i.e., less than the threshold NFMIN). 
The algorithm could terminate here, with the final output being a list 
of beginning and ending frame pointers for each detected utterance. 
However, we have incorporated several decision rules based on the 
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knowledge that we are detecting digit strings. For this special vocab- 
ulary, only two words (the digits six and eight) can possibly contain a 
stop gap. Thus, all other words in the vocabulary can be represented 
by a single energy pulse with no other pulses attached. Also, for the 
digits six and eight, the maximum energy pulse is always the first pulse 
when a secondary pulse is added. Given the rules for combining pulses, 
this implies that no pulse should be added to the beginning of a 
maximum energy pulse. Next, both the digits six and eight have at 
most only one stop gap present, implying that at most one pulse can 
be added to the end of a maximum energy pulse. By adding these 
additional rules to the endpoint detector, we can increase overall 
accuracy for this specialized vocabulary. 


3.2 Semantic constraints from the digit recognition task 


We further assume that the input speech is a sequence of MXWD 
isolated digits (e.g, MXWD equals seven for a telephone number). 
Thus, for this specialized case, we know the number of utterances 
within the recording interval. This information can also be incorpo- 
rated into the algorithm. One way to implement this idea is to sort 
the final endpoint detector output in order of maximum peak energy 
level and to retain the top MXWD utterances. If the output of the 
pulse combiner indicates that fewer than MXWD words were found, 
we assume that some of the uttered words were spoken as connected 
sequences rather than as isolated words. This is because the pulse 
detector has its parameters set to find any spoken utterance with a 
peak energy of greater than 10 dB (note the average peak energy for 
utterances recorded previously is between 30 and 50 dB?*). 


IV. EVALUATION OF THE TOP-DOWN APPROACH TO ENDPOINT 
DETECTION 


To evaluate the top-down endpoint detector, a series of experiments 
were performed using telephone recordings from a subset of the data 
described in Ref. 1. This database consisted of 11,035 digits spoken by 
3153 people in highly variable telephone transmission conditions. For 
evaluation purposes we used a subset of 820 digits spoken by 218 
talkers. This particular subset of data was used because its statistics 
were similar to those of the entire 11,035-digit database. Also, the 
experiments we planned to perform were so computationally extensive 
we wanted to choose a small subset of the database. 

For recognition purposes we used the 30-template-per-digit refer- 
ence set used in Ref. 1. These templates were extracted from a subset 
of 3700 digit tokens using the Unsupervised Without Averaging 
(UWA) clustering algorithm.’ 

The first experiment concerned direct measurement of recognition 
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accuracy as a function of error (as measured with respect to hand- 
chosen endpoints) in endpoint location. The second set of experiments 
compared the bottom-up and top-down approaches on the 820-word 
test vocabulary. 


4.1 Recognition as a function of endpoint location error 


Based on the energy contour of the recording interval and on careful 
listening to the speech, we manually determined the endpoints for 
each of the 218 strings (to the nearest 15-ms interval). Figure 5 shows 
some typical speech utterances along with their manually determined 
endpoints. These examples show some of the typical problems asso- 
ciated with endpoint detection of this database. Figure 5a, which shows 
the energy contour for the string /391/, exhibits a nonstationary noise 
floor, where a person was talking in the background. Figure 5b, which 
shows the contour for the digit string /8292/, exhibits transients that 
were introduced by the transmission system (i.e., P;, P2, Ps, Ps are the 
transients, while A;, A», As, A, are the actual spoken digits). Note that 
the peak s/n is 31 dB, but the s/n’s of most of the individual digits are 


/391/ 











saa 


nun 


A Be O09 0 on 5 OD ee ots C7 ES —— 


ENERGY IN DECIBELS 





FRAME NUMBER 


Fig. 5—Log-energy contour for the spoken strings (a) /391/ and (b) /8292/. Solid 
lines indicate manual placement of word endpoints. 
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lower (e.g., the second digit, two, has peak s/n of about 12 dB). After 
the endpoints were manually determined, recognition was performed 
on the isolated digit database. The recognizer used was the LPC-based 
recognizer,* ® which has been used and studied extensively at AT&T 
Bell Laboratories. A simple K-nearest neighbor decision rule was used 
in all tests. The overall recognition accuracy obtained was 93.0 percent 
using the manual endpoints. 

After recognition was performed, the manually detected beginning 
point and ending point of each word were automatically varied in 15- 
ms (single-frame) steps from 150 ms before the manually determined 
endpoint to 150 ms after the endpoint. Recognition was performed at 
each interval with the results tabulated in the form of a contour plot. 
Figure 6 shows a contour plot of overall recognition accuracy as a 
function of the change in the endpoint position in ms. Each ring 
represents a 1-percent change in recognition accuracy. The contour 
plot was obtained by averaging over all digits in the test database. 
Figure 6 shows, as anticipated, that the best recognition score, 93.0 
percent, was obtained when the exact manually determined endpoints 
were used. 


-10 PERCENT. -20 PERCENT. _-- 730 PERCENT 


Dp y ~\ 
<70 PERCENT 
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Fig. 6—Contour plot showing results of recognition experiment where manually 
placed endpoints were varied by +150 ms. Results are averaged over all digits. Each 
ring represents a 1-percent change in recognition accuracy. 
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The contour plot of Fig. 6 also implies that if the endpoints were 
varied only slightly from the hand-placed endpoints, the recognition 
accuracy would drop. For example, a 3-percent reduction in accuracy 
occurred if both the endpoints were in error by +60 ms. We see that 
the rings of the contour plots are fairly concentric, implying a uniform 
decrease in recognition accuracy as the endpoints are placed further 
away from manually chosen ones. 

If we look at contour plots of the individual digits, we see that their 
rings are definitely not concentric, and that the best recognition 
accuracy from most of the digits was not obtained using the manually 
determined endpoints. Table I gives the best recognition scores on a 
per-digit basis, along with the changes that were made to the endpoints 
in order to obtain those results. Figures 7 through 9 show contour 
plots for some of the digits. We can make several observations from 
these curves. Figure 7 shows the contour plot for the word zero. The 
best accuracy for this word (averaged over all occurrences of the word) 
was obtained if the manually determined beginning points were moved 
in (i.e., closer to the ending point) by 30 ms. It can be seen that the 
digit zero is more sensitive to variations in the ending point than the 
beginning point. For the digit one (see Fig. 8), we see the best results 
(96.1 percent) were obtained if the ending point was moved out by 90 
ms (six frames). This is quite a large amount, and may be justified by 
the fact that the nasal sound at the end of the word one is of such low 
energy that, using the energy contour and human listening, accurate 
placement of the ending point cannot be made. This plot also shows 
that the beginning point of the digit one is much more sensitive than 
the ending point. If the beginning points are varied by —60 ms (from 
the optimal point), the recognition accuracy drops by 28 percent, but 
if the ending point is varied by —60 ms, the recognition accuracy drops 
only 1.5 percent. Figure 9 shows the contour plot for the word six. We 


Table |—Recognition results from test run with 
modified endpoints 


Change in Change in 
Percent Beginning Ending 


Digit Correct Point (ms) Point (ms) 
0 90.4 30 0 
1 96.1 0 90 
2 94.9 0 —60 
3 95.6 0 0 
4 93.8 0 —30 
5 96.7 0 0 
6 95.7 0 —90 
7 97.2 —30 30 
8 97.1 0 —30 
9 88.9 0 30 
Total over all digits 93.0 0 0 
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Fig. 7—Contour plot showing results of recognition experiment where manually 
placed endpoints were varied by +150 ms. Results are only for the digit zero. Each ring 
represents a 1-percent change in recognition accuracy. 


see that the rings are highly nonuniform, with several local maxima 
present throughout the plot. The best recognition accuracy for six was 
obtained when the ending points were cut back by 90 ms (95.7 percent). 
However, a cutback in the ending points of only 30 ms coupled with a 
30-ms increase in the beginning points also yielded the same results 
(95.7 percent). Similar observations can be made for the rest of the 
digits. 

The main point to emphasize is that extremely accurate determi- 
nation of the speech endpoints must be made, in order to obtain the 
highest system accuracy using our LPC-based recognition system. 


4.2 Accuracy of automatically determined endpoints 


Endpoints were automatically determined for the 820-digit database 
using both the bottom-up and top-down approaches. Figure 10 shows 
a histogram of the error in frame location of the top-down endpoints 
compared to the manually determined endpoints for the 820 digits. 
Figure 10a shows results for the beginning frame; Fig. 10b shows 
results for the ending frame. The automatically determined endpoints 
agree with the manual endpoints within +1 frame 68.2 percent of the 
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Fig. 8—Contour plot showing results of recognition experiment where manually 
placed endpoints were varied by +150 ms. Results are only for the digit one. Each ring 
represents a 1-percent change in recognition accuracy. 
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Fig. 9—Contour plot showing results of recognition experiment where manually 
placed endpoints were varied by +150 ms. Results are only for the digit six. Each ring 
represents a 1-percent change in recognition accuracy. 
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Fig. 10—Histogram of error in frame location of top-down endpoints compared to 
manually determined endpoints for (a) beginning frame, and (b) ending frame. 


time, within +2 frames 78.5 percent, and within +5 frames 90.0 percent 
for the combined beginning and ending points. 

Figure 11 shows some examples of how the new automatic endpoint 
detector worked on several representative strings of digits. Shown are 
log-energy contours with dashed lines indicating where the algorithm 
determined the digit endpoint locations to have been. The string in 
Fig. lla, /2226242/, is an example of speech spoken in the presence 
of highly variable background noise. The peak s/n for this example is 
21.7 dB; however, the s/n’s for most of the digits in the string are well 
below that figure. Under laboratory conditions (speech over a local 
PBX) the peak s/n is usually between 35 and 50 dB. The string in Fig. 
11b, /6854566/, is an example of how the endpoint detector is some- 
times able to split connected words (the 68 and 4566 are connected). 
The string in Fig. 11c, /4736354/, shows a fully connected string of 
digits. The first three digits /473/ were determined to be one utterance, 
and the next four digits, /6354/, were split into separate utterances. 
Finally, the string in Fig. 11d, /2294761/, shows that the new endpoint 
detector can work very well even on very bad background conditions. 
Note the extremely variable background noise level (peak s/n of only 
17 dB). 
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Fig. 11—Log-energy contours from several representative digit strings showing how 
top-down endpoint detector performed. Dashed lines indicate where algorithm placed 
beginning and ending word markers. (a) String /2226242/ shows speech spoken over 
highly variable background noise. (b) String /6854566/ shows how endpoint detector 
can split connected words. (c) String /4736354/ shows a fully connected string of digits. 
(d) String /2294761/ shows speech carried over very noisy telephone lines. 


4.3 Recognition results using the bottom-up approach 


Recognition was run on the 820 digits database using the bottom- 
up endpoint detector. For this algorithm, only 68.5 percent of the 820 
digits were detected. Of the detected words, 85.2 percent were correctly 
recognized. 


4.4 Recognition results using the top-down approach 


The top-down endpoint detector was substituted for the bottom-up 
approach and the entire recognition process was repeated. The new 
endpoint detector found 800 of the 820 digits (97.6 percent). In looking 
further we found that 10 double words (connected) were found by the 
endpoint detector and 5 false alarms were also made. Therefore, the 
new endpoint detector actually found 805 of the 820 digits in the 
database (98.2 percent). The recognition accuracy on this test set was 
90.0 percent, with 711 of the 790 utterances correctly recognized. Since 
we are using an isolated word recognition system, the recognition 
errors produced by the 10 double utterances were removed. The 79 
errors in the isolated word recognition system were attributed to the 
following causes: 

1. For 10 words, the beginning point included too much of the 
background noise. 

2. For 8 words, the ending point included too much of the background 
noise. 

3. For 4 words, both endpoints were greatly in error. 

4, The endpoint detector failed to find the entire word in 12 cases. 
These all occurred for the digit six, where the /IX/ was left out. 

5. For the remaining 40 errors, the endpoint detector found the 

correct endpoints; however, the recognizer was unable to recognize the 
words. 
These errors can be thought of in two ways. Either they were attrib- 
utable to the endpoint detector, or they were recognizer errors. Types 
1, 2, and 3 above are clearly endpoint detector errors. Type 5 is clearly 
a recognizer error. Type 4 can also be considered a recognizer error, 
as the template set has tokens for the word six without the final 
/1X/. 

If we were to compute a recognition error rate due entirely to the 
endpoint detector, the errors we would include would be types 1, 2, 
and 3 above, plus the 5 false alarms and the 15 words missed entirely 
by the endpoint detector. We are not including the 10 double words 
found because they clearly were connected words, which could possibly 
be recognized by a connected word recognition system. Therefore, a 
total of 42 errors out of a total of 805 utterances (790 utterances plus 
15 utterances not found) were due to the endpoint algorithm. This 
yields an endpoint detector hit rate of 94.8 percent. The corresponding 
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recognizer accuracy, if we eliminate the endpoint error rate, would be 
93.3 percent (711 utterances correct out of a possible 763 words). 
Hence, our recognition results are comparable to those obtained from 
manual endpoints; however, we suffer a 5-percent error rate in digit 
detection. 


4.5 Alternatives to top-down approach 


Several other methods of endpoint detection were examined during 
our study. Initially, we tried to improve the bottom-up algorithm by 
first filtering the speech into four bands. Endpoint detection was 
implemented in each of the four bands and then combined based on a 
set of rules. The filter bank that we implemented used filters from 100 
to 500 Hz, 500 to 1000 Hz, 1000 to 2000 Hz, and 2000 to 3200 Hz, with 
a small amount of overlapping. This approach yielded results signifi- 
cantly worse than the top-down approach described here. We then 
included the filter bank approach in the top-down endpoint detector. 
This also degraded the overall system accuracy. Another technique 
that was examined, though computationally expensive, was the level- 
building speech recognition algorithm of Myers and Rabiner. This 
allowed for an open-ended dynamic time-warp space, and, therefore, 
the recognizer itself could possibly find the correct endpoints. This 
approach neither increased nor decreased the accuracy of the endpoint 
detector. 


V. DISCUSSION 


Table II shows the results of the recognition experiments run on 
the different endpoint detection algorithms. The first point to make 
is that while the new endpoint detector accurately detected 94.8 
percent of all words, the old approach only found 68.5 percent. This 
translates into an recognition error rate component due entirely to 
nondetection of speech of 5.2 percent and 31.5 percent for the top- 
down and bottom-up algorithms, respectively. 

We also see that the bottom-up algorithm correctly recognized 85.2 
percent of the words it detected, while the top-down approach recog- 


Table 1I—Comparison of bottom-up endpoint 
detector with new top-down endpoint detector 


Recognition 


saccuracy Overall 
Words on Words Recognition 
Endpoint Detected Detected Accuracy 
Algorithm (Percent) (Percent) (Percent) 
Bottom-up 68.5 85.2 09.1 
Top-down 98.2 90.0 89.4 
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nized 90.0 percent of the words it detected. If we take into account all 
errors (endpoints and recognizer) made in processing the 820-word 
database, the bottom-up endpoint approach led to a total recognition 
accuracy of 59.1 percent, while the top-down approach had an overall 
digit recognition accuracy of 89.4 percent. Clearly the top-down end- 
pointing algorithm is superior to the bottom-up approach. 


VI. SUMMARY 


We have described a new approach to word endpoint detection. We 
call it a top-down design. Experimental results have been presented 
indicating that this new approach is able to detect words in highly 
variable noise environments, as are observed in the telephone network, 
with much higher accuracy than an earlier implementation of the 
endpoint detector. The performance of this new technique approaches 
that of manual endpoint detection—i.e., their recognition accuracies 
were comparable. 
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Signals Over Nondispersive Fading Channels 
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This paper presents performance analysis of dual-polarized M-ary Quad- 
rature Amplitude Modulated (QAM) signals over nondispersive radio chan- 
nels. In particular, dual-polarized, 16-QAM signals are examined. A simple 
method for adaptive cancellation of static and fade-induced cross-polarization 
interference is introduced. The cancellation is performed at baseband. For 
this canceler, two adaptation methods are studied. The results indicate that 
dual-polarized M-ary QAM is not feasible over fading channels unless means 
of adaptive cancellation of the cross-polarization interference are provided. 
The results also indicate that the adaptive algorithm employed in cross- 
polarization interference cancellation should take into account noise power 
reduction. 


I. INTRODUCTION 


Consider transmission of two orthogonally polarized Quadrative 
Amplitude Modulated (QAM) carriers over a single communication 
route. As an example, envision a dual-polarization radio communica- 
tion system where the available bandwidth is “reused” in order to 
double the route capacity by transmitting two independent M-ary 
QAM signals over the same Radio Frequency (RF) channel, using 
orthogonally polarized waves. Because of channel impairments, such 
as fades or antenna imperfections, however, the orthogonally polarized 
waves are received depolarized at the receiver. Consequently, there 
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will be some interference imposed on each signal, causing errors in the 
detection process. We refer to this effect as cross-polarization (x-pol) 
distortion. For such a case, the x-pol parameters have randomly time- 
variant characteristics unknown to the receiver in advance. 

Several methods for canceling the x-pol distortion (XPD) in dual- 
polarized systems have been proposed by investigators.’> The first 
two references assume access to some beacon signals for cancellation 
of the x-pol distortion. For instance, Chu? considers the use of two 
pilot tones, one for each polarization, to estimate the scattering matrix 
of the radio channel, and applies a differential phase and attenuation 
in the receive antenna feed to eliminate the x-pol interference and 
restore the orthogonality. Steinberger® proposes a recursive equaliza- 
tion structure that operates at RF and experimentally shows that the 
device can cancel the x-pol distortion induced by fades over a terres- 
trial radio link when dual-polarized eight-phase shift keying signals 
are employed. RF/IF x-pol cancellation schemes are suitable when the 
dual-polarized signals have to be transmitted over several nonregener- 
ative hops, with cancellation occurring on each hop. Such might be 
the case in many terrestrial radio communication applications where, 
by avoiding baseband x-pol cancellation, the operation is more cost- 
effective in terms of a reduced number of required modems. However, 
for digitally modulated carriers, especially if used over a regenerative 
transmission hop, the x-po] distortion can be eliminated directly at 
baseband as a part of the information detection process. Attempts in 
the area of baseband x-pol cancellation have been made by Culmone,* 
and Nichols et al.° 

In this study we suggest a very simple adaptive baseband canceler 
to eliminate the x-pol distortion in dual-polarized M-ary QAM sys- 
tems. Two methods of canceler taps adaptation are evaluated. We 
analyze the performance of such systems by deriving an average 
probability of error as a function of signal-to-noise ratio (s/n), x-pol 
distortion, and nondispersive fade Jevel with or without the baseband 
canceler. The results indicate that over a typical fading radio channel 
with static x-pol of about —25 dB, dual polarization of M-ary QAM 
signals is possible only if some x-pol distortion cancellation method is 
employed. In Section II, we explain the mathematical modeling of the 
dual channel. The adaptive baseband canceler is described in Section 
III. Section IV presents the performance analysis of dual-polarized, 
M-ary QAM systems with or without x-pol distortion cancellation. 
Section V gives the numerical performance results for 16-QAM sys- 
tems. It should be noted that even though the analysis is a baseband 
analysis, the results are also applicable to RF or Intermediate Fre- 
quency (IF) cancellation systems employing similar adaptation algo- 
rithms. 
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Il. DUAL-POLARIZATION CHANNEL MODEL 


We consider transmission of two independent, orthogonally M-ary 
QAM carriers with the same bandwidth and center frequency. The 
bandpass signal on either of the two orthogonal channels can be 
represented by 


S(t) = R.{si(t)exp(jwct)}, i= 1, 2, (1) 


where R,{-} denotes the real part, s;(t) is the low-pass complex enve- 
lope, and w, is the nominal carrier frequency: The complex envelope 
can be expressed as 


oe] 


S(t) = LY a(m)h(t- mT), i=1, 2, (2) 


m=0 


where a;(m) denotes the complex-valued information symbol stream, 
and h(t) is the complex low-pass equivalent of the overall system 
impulse response. The complex-valued symbols are denoted by 


a;(m) = 6;(m) + jB;(m), i= L 2, 


where 6;(m) and 6;(m) take on elements of the set {+c, +3c, ---, 
+(L — 1)c}, with L = VM, and M the number of levels of the M-ary 
signal. The parameter, L, is chosen to be even. The constant, c, denotes 
the distance of each point in the signal constellation from its nearest 
decision region boundary. The random variables, 6;(m) and 6;(m), are 
identically distributed and take on the specified values with equal 
probability. Note that in eq. (1) it is assumed that the data sequences 
are synchronized and carrier signals are coherent. These assumptions, 
however, may not be necessary in practice. 

The channel is assumed to be of the slowly time-variant nondisper- 
sive type that takes two independent streams of data and distorts the 
transmission by introducing a fraction of one stream into the other. A 
practical model of such a channel might be a satellite channel with 
rain-induced attenuation and depolarization, which is usually nondis- 
persive across the channel bandwidth; hence, depolarization of the 
dual-polarized signals over such a channel resembles the described 
scenario. 

It is known that deep multipath fading of the main polarization 
signals on microwave radio routes, in general, is dispersive. However, 
because of the lack of empirical data concerning the dispersiveness of 
the cross-polarized parameters, we confine this analysis to those radio 
channels where all the signals are subjected to nondispersive fading 
(for example, satellite channels). 

The dual channel matrix is characterized by 
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in which the a;;’s, j = 1, 2, k = 1, 2, are complex-valued quantities used 
to represent the channel attenuation and phase shift. These quantities 
are randomly time variant; however, the time variations are assumed 
to be slow in comparison to the symbol rate of the signals. Such slow 
variations can be tracked and canceled adaptively. The received low- 
pass equivalent signals can be expressed as 


ri(t) = ay81(t) + a1282(t) + m(t) (4) 
ro{t) = do181(t) + ae28e(t) + ne(t), 


where 7:(t) and n2(t) are independent, zero-mean, white Gaussian 
processes. The received signals are filtered by receive filters matched 
to the transmit signals and sampled at every symbol period. The 
sampled signals are denoted by x;(k), i = 1, 2, and are expressed as 


x(k) = ayiai(k) + ayoae(k) + n4(k) (5) 
Xo(k) = Aoi(k) + AeeG2(k) + no(k). 
The colored noise sequences, {n,(k)} and {n2(k)}, are independent 
samples of zero-mean, complex-valued Gaussian processes with equal 
variances 


E{|ni(k) |} = on, t= 1, 2, 


where E{-} denotes the statistical average. The factors a,, and do. 
represent the in-line attenuation and phase shift; the factors a,. and 
dz; represent the x-pol coupling on the two channels. 

Data calculated by Chu® show that for linearly polarized waves, the 
behavior of the cross-polarized signal amplitude can be described by 


| ai; |? 





XPL = i=1,2, j=1,2, i+], (6) 


| ais|?’ 
where XPL is defined as cross-polarization factor of linearly polarized 
waves’ and 


a 1/C 492 _g,2 
= ae —405 any 4 807, 7 
PL : (5 e [1 — cos(4re®™) ]65, (7) 


where o, is the standard deviation of time-varying, mean-canting 
angle 9,, over various rainstorms, oe is the standard deviation of 
anistropy angle 0, 7 is the orientation angle of the quasi-vertical 
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polarization, 6 is proportional to in-line attenuation factor, and C/D 
is proportional to differential propagation constants. Measured data 
from COMSTAR II follow these calculations closely.® The comparison 
is shown in Fig. 1 of Ref. 6. We will use these results to introduce an 
average probability of error as a function of in-line attenuation level. 

In the next section we describe a simple structure that can reduce 
the x-pol distortion in such systems. 


lit. ADAPTIVE BASEBAND CANCELER MODEL 
The adaptive canceler that attempts to remove the x-pol distortion 


is characterized by 
Wi, W 
W = 11 12 ; 
W21 Wee 


where wj;'s, i = 1, 2, = 1, 2 are the canceler coefficients. This adaptive 
device, which is a part of the M-ary QAM detector circuit, is studied 
under two adaptation methods. The first method employs a Least 
Mean-Square (LMS) error algorithm and the second applies channel 
matrix diagonalization. Figure 1 shows the LMS canceler structure. 
The samples at the matched filter output of each receiver are inputs 
to a bank of adaptive filters formed by a set of multiplier accumulators 
(MACs). To update the coefficients of the canceler, each MAC con- 
tains storage elements for storing the result of the multiplication of 
the signal-sample detection error and the complex conjugate of the 
corresponding received signal sample at the matched filter output. 
The calculated coefficients are multiplied by the signal samples at the 
matched filter output and used to cancel the x-pol distortion. The 
detectors shown in Fig. 1 are part of the M-ary QAM demodulators. 

The canceler structure consists of a simple adaptive filter, which 
minimizes the least mean-square error in symbol estimation. The 
theory of this type of filter is well known’ and is solely based on the 
statistical orthogonality principle. According to this theory, mean- 
square error is minimum when the error in symbol estimation is 
statistically orthogonal to the variable being observed. In the case at 
hand, we show the signal-sample estimation error by 


e(k) = ai(k) — ai(k), t= 1, 2, (8) 


where a;(k) is the detector input, as shown in Fig. 1. We now select 
those canceler coefficients for which the mean-square error is mini- 
mum, i.e., the results of solving min Ef{]«(k)|? + | e(R)|7}. After 
proceeding with the minimization process, the following set of equa- 
tions will lead to the optimum determination of the coefficients w?, 
t= 1, 2,j = 1, 2: 
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Fig. 1—Block diagram of adaptive LMS baseband canceler. 


[laul? + |ai2[? + o2]w% + [agai + Qj2A22]W$2 = a} 


[asiai. + Qy2032|w’, + [aor]? + |ao2|? + o2]w ts = ad, 


[lau]? + |a2|? + oz]wa + [aaah + Aj2A22]W 22 = Ajo 


a9. (9) 


As Fig. 1 shows, the signal samples at the canceler output for each 
channel.can be expressed as 


ee = wyx(k) + wi2xe(k) 


[azai1 + ay2092]W2 + [| ae |? + | do |? + oZ]w%. 


Gig(k) = Worxi(k) + Weexe(h), (10) 


where a and a are complex quantities. In eq. (10), by substituting 
Xm(k)’s, m = 1, 2 of eq. (5) and the coefficients w, i = 1, 2,7 = 1, 2 of 
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eq. (9), one can form a decision variable for each channel for the 
derivation of probability of error. 

An alternate solution to this type of optimization problem, which 
uses the steepest descent algorithm and is simple to implement, was 
suggested by Widrow.® The solution is recursive and states that 

wi) = wl + e(k)xf(k), i=1,2, gul,2, (1) 
where * denotes the complex conjugate and k represents the sampling 
instant. In eq. (11) the noisy estimates of the cross-correlation of the 
observed signal and error signal are used as unbiased estimates to 
update the canceler coefficients at every baud interval. Such algo- 
rithms are well known in adaptive filtering and equalization. The 
realization of eq. (11) is shown in Fig. 1. The MACs in the figure 
update eq. (11) by storing the result of multiplication of signal samples 
and detection error samples. 

As an alternative adaptation method, we consider a case where the 
canceler coefficients are determined by forcing the x-pol interference 
on each channel to zero.® This is equivalent to diagonalizing the overall 
channel matrix; i.e., substituting x,,(k)’s of eq. (5) into eq. (10) and 
forcing the coefficient of the undesired signal to zero on each channel, 
ie., by 

W11012 + W 32092 = 0 
ae + W202 = 0. (12) 


In this case we refer to the canceler as a diagonalizer. Amitay describes 
the realization of an IF diagonalizer.? In analogy to intersymbol- 
interference removal by zero-forcing equalization, this method can 
also be referred to as zero-forcing cancellation. Figure 2 shows a block 
diagram of the diagonalizer. Note that in canceling the interference, 
the diagonalizer neglects the thermal noise completely. 

In the following section we will evaluate the canceler for both cases 
described. 


IV. SYSTEM PERFORMANCE ANALYSIS 


In this section we derive an upper bound on the average probability 
of error for dual-polarized M-ary QAM signals with and without the 
x-pol distortion canceler. Throughout this section it is assumed the 
data sequences on the two polarized channels are independent, equally 
likely, M-ary QAM signals. The channel is characterized by the matrix 
introduced earlier. To simplify the derivation, with no loss of gener- 
ality, we can assume the phase angles of a1; and dg2 are zero and use 
the normalized notations 
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Fig. 2—Block diagram of adaptive diagonalizer. 


a ; 
—* = & exp( jd), 
Qy1 


a : 
a= £ exp(jdz), 
Q22 


and 


an 


= &. (13) 








a22 


All the variables introduced in eq. (13) are time variant, but the time 
variations are slow relative to symbol rate so the receiver can obtain 
perfect estimates of the channel matrix components. The phase pa- 
rameters ¢; and @¢2 in eq. (13) are uniformly distributed over [—7z, z]. 
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Also, to further simplify the presentation of the results, we assume 
£, = & = & and & = 1. This model was shown by Chu to be a valid 
model for depolarization of dual-polarized waves due to heavy rain- 
fall.!° Furthermore, this simplified model of a x-pol channel still 
provides means of evaluating the canceler performance, and the error 
probability bounds derived should be useful in preliminary system 
planning. 


4.1 Performance in the presence of the canceler 


In this section we first derive an average probability of error applying 
the diagonalizer. Then we proceed with deriving an average error 
probability for the LMS canceler. 

To calculate a simple upper bound on the probability of error 
performance when the diagonalizer is present, we use the complex 
valued estimates of the data symbols at the canceler output given in 
eq. (10), along with the constraints in eq. (12) and matched filter 
outputs of eq. (5), to define the following simplified decision variables 
for the two channels: 


i = ayll — ei +)]6,(k) + mi(k) — no(k) Ee 


din(k) = agall — 22ers] Go(k) + na(k) — m(A)éee, 4) 


As this equation shows, the decision variable for channel i depends on 
the parameters of channel j, i # j,i = 1, 2,7 = 1, 2, namely, ¢; and n,. 
Now consider the in-phase and quadrature-phase components of each 


channel. The decision variable for channel 1 in eq. (14) can be 
expressed in terms of its real and imaginary parts as 


zr(k) = —d(k)Ecos(di + G2) + B(R)ESin(¢1 + $2) 


1 1 ; 
+ — mr(k) + — no(k)E sin(¢1) 
a1 Qy1 


= — non(k)t cos(d1) 
2r(k) = —B(k)Ecos(¢1 + be) — 5(R)Esin(d1 + ¢2) 


1 1 
+ — my(k) — — no (RE cos(do) 
ayy Qi 


1 . 
— — mor(k)é sin(do), (15) 
ay 
where njg(k) and njz(k), t = 1, 2, are the real and imaginary parts of 
Gaussian noise samples at sampling instant k, which are identically 


distributed random variables with the same variance, o%. Now, an 
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error is made on channel 1 if |zz| > c or |z;| > c, where c, as stated 
earlier, is the signal distance from its nearest decision region boundary 
in the signal constellation. Therefore, the probability of error on 
channel 1 can be expressed as 


P, = == (Pslzn| > 0) + Pilzrl >} (16) 


To derive the error probability, we apply the well-known™ Chernoff 
bound, which states 


P,{z > c} < exp(—Ac)E{exp(z)}, A120, (17) 


where E{-} denotes the statistical average of the random variable z. 
This is valid for any \ = 0. Using the positive \ that minimizes the 
right-hand side of eq. (17) establishes the least upper bound. Hence, 
we apply eq. (17) to eqs. (15) and (16) combined. The actual derivation 
of the upper bound is in Appendix A. 

The resulting probability of error bound is 





es ..|2 
ptt tesa a iearrep ithe 68 
where 
y= i : 7 = the unfaded s/n 
|a;;| = the in-line voltage on channel 1, i= 1, 2. 
We define 


XPD = 20 logyé, dB (19) 


as a measure of x-pol distortion to represent the cross-coupling be- 
tween the two channels, and 


v = 20 logio| aii |, dB i= 1, Z (20) 


as a measure of flat-fade level. When there is no fade, v = 0 dB and 
the only contribution to x-pol distortion is due to the static effects, 
such as antenna imperfections, in which case XPD is denoted by 
XPDbo. The fade-induced part of the x-pol distortion is put into effect 
when |a;;| < 1, i = 1, 2. Now, eq. (7) can be applied to relate the 
average probability of error to in-line attenuation and to remove the 
x-pol factor in eq. (18). That is, 
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LaH=A 3 
.. [2 ate a 
PA| aii | ) S L exp | 2(L? _ 1) 


{2 
—— ee! ___| g=1,2; (21) 
1+ y XPL?|a,|? + XPL 


We now proceed with reevaluating the probability of error when the 
LMS canceler is employed. As we stated in Section III, the LMS 
canceler adaptively calculates its coefficients so that the mean-square 
error is minimized. By using the optimum set of coefficients of eq. (9) 
in eq. (10), we can define a new decision variable for each channel; 
that is, 


— (4,0 0 ~ 0 0 ~ 
Qy = (W110y, + W221) + (W412 + W}2Q22) ae 
+ win, + wong 


(22) 
= (Wai + Wde1)a1 + (Wag + WA) a2 


Q 
t 
| 


0 0 
+ Wain + Wo2Ne, 


where w}’s are the optimum coefficients. As we see again, the decision 
variable for channel i depends on the parameters of channel J, i ¥ J, 
i = 1, 2,7 = 1, 2. In a manner similar to what was explained for the 
diagonalizer, we can calculate the probability of error for the LMS 
canceler. The actual derivation of the bound is in Appendix B. The 
resulting probability of error is 


_l-1("/,_¢\ Jf 3 7V(9) } 
ie In Yo ( £ Nex 2(L?—1) y0(¢) + A(¢) a (23) 


where 





and V(¢), 0(¢), and A(@) are defined in eq. (34) of Appendix B. An 
attempt to solve eq. (23) in a closed form turned out to be inconclusive, 
so it was calculated on a computer numerically. To express eq. (23) 
only in terms of fade level, again we can use eq. (7) to remove £&. 

To make a comparison, we remove the canceler and repeat the 
derivation of the probability of error for a baseline, dual-polarized, 
M-ary QAM system. 


4.2 Performance of baseline, dual-polarized, M-ary QAM system 


In this case the error-bound derivation is simplified because for 
channel i the decision variables are independent of the other channel’s 
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parameters, namely, ¢; and n;,j # 1,1 = 1, 2,7 = 1, 2. Therefore, eq. 
(15) is reduced to 


zr(k) = £62(k)cos(d1) — EG2(k)sin(¢:) + nir(k) 
Qi (24) 


SiO) SPB Picea) Bein) + ~ ie. 


Using a similar approach, as for the previous cases, we derive an upper 
bound on the error probability. This derivation is given in Appendix 
C. The result is 


L-1_ f 3 ylaul? | 


pe : 
ae | 2(L? — 1) 1+ ye| ais? | 


L t= 1,2. (25) 


This is a simple bound to calculate, and in terms of fade level, it can 
be expressed as 


3 
ie - 2(L? — 1) 





P.(| aii |”) S 
; | aii |? 
1+ y| az |?-XPL 


The numerical results in the following section illustrate the perform- 
ance. 


| t= 1,2. (26) 


V. NUMERICAL PERFORMANCE FOR 16 QAM 


In this section we evaluate the bounds derived in the previous section 
for dual-polarized 16-QAM signals. 

First, we consider 16 QAM with no cancellation. The upper bound 
of eq. (25) is shown in Fig. 3 for three different static x-pol distortion 
(XPDo) values. These curves represent the average error probability 
bound for 16-QAM signals as a function of static x-pol distortion and 
s/n when no cancellation is adopted and no fading exists. Figure 3 
also shows the theoretical performance of the 16 QAM and the 
theoretical calculated upper bound, i.e., for the case when there is no 
fade and no x-pol distortion. As we see, the upper bound curve is very 
close to the actual theoretical curve. These results indicate that 
improving the static x-pol can improve the overall performance sub- 
stantially. Figure 4 demonstrates the bound in eq. (26) for 5-dB flat 
fade, using Fig. 1 of Reference 6, which predicts an XPL of 28 dB for 
a 5-dB flat fade. As we see, the sensitivity of the error probability to 
the fade level is quite high. 

Next, we apply the canceler and the diagonalizer described earlier 
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Fig. 3—Probability of error vs. s/n for dual 16 QAM without canceler; no fading 
exits. 


and show the bounds in eqs. (18) and (23) in Fig. 5 for different 
XPDos. High values of XPDos could occur in poorly aligned antenna 
systems. As Fig. 5 illustrated, the LMS canceler and the diagonalizer 
behave quite differently. The LMS canceler improves the performance 
significantly even at rather poor XPDs, e.g., XPDo = —5 dB, while the 
diagonalizer is almost useless for such a case. As the XPDp value is 
improved, e.g., for a XPDp = —25 dB, the performance of the two 
cancelers becomes the same. This is because as XPD increases, the 
diagonalizer coefficients grow in a direction to cancel XPD, while 
neglecting the thermal noise completely; consequently, the noise power 
in each channel is increased strongly. The LMS canceler, however, by 
minimizing the combined noise and XPD power, produces an accept- 
able performance. On the other hand, as XPD is improved, the 
diagonalizer becomes as attractive as the LMS canceler since there is 
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ee of error vs. s/n for dual 16 QAM without canceler; 5-dB flat fade 
applied. 


not much XPD to cancel; consequently, there is not much noise 
enhancement. However, over fading channels where XPD, dB can 
even be positive, use of the LMS canceler will ensure a more reliable 
system. 

We then apply 5-dB flat fade and draw the average error rate bounds 
for both cancelers as a function of fade level in Fig. 6. As we see, the 
XPD is removed for a practically reasonable static XPD. The horizon- 
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Fig. 5—Probability of error vs. s/n for dual 16 QAM with canceler; no fading exists. 


tal translation of the curves reflects the 5-dB signal power loss due to 
fade since we have employed unfaded s/n in sketching these figures. 

Note that rain fading increases the system noise temperature as 
follows. If we assume the noise temperature of the receiver and the 
following stages to be To, and in the presence of rain, T,, the increased 
system noise temperature in rain is 


Lp = To +L = 2) Vain 
where 
v = |a,;| = in-line fade level 
Train = effective temperature of the rain. 


For example, for a flat fade of 5 dB and rain temperature of 280K, the 
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Fig 7h Oneniliey of error vs. s/n for dual 16 QAM with canceler; 5-dB flat fade 
applied. 


system noise temperature increases by 122.5K. The additional increase 
in noise temperature will further translate the curves in Fig. 6 to the 
right, horizontally. In practice the noise power increase has to be 
factored in system power budget. 


VI. CONCLUSIONS 


In this paper we studied the performance of dual-polarized, M-ary 
QAM signals in terms of average probability of error as a function of 
s/n, x-pol distortion, and fade level. An x-pol cancellation method 
operating at baseband was suggested. Two different adaptation meth- 
ods were considered in calculating the canceler coefficients. In partic- 
ular, the performance was evaluated with and without the XPD 
cancellation for 16-QAM signals in dual polarization with or without 
fade. The results indicate that without applying some kind of x-pol 
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cancellation, dual polarization of M-ary QAM signals is not feasible. 
The results also indicate that the adaptive algorithm employed in 
cross-polarization interference cancellation should take into account 
noise power reduction. 
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APPENDIX A 
Derivation of Error Bound for the Diagonalizer 


Consider one of the dual-polarized channels, e.g., channel 1. Using 
the Chernoff bound for the in-phase rail of the M-ary QAM signal 
and eq. (15) in Section 4.1, 


P,., = P,{| zr | > c} < exp(—Ac) Blox | -raetoste + rBE 


-sin(¢) + no Nir — ua (naseos(o = nasin(6:))|f » (27) 


where 


d= + do. 


Since the terms in the argument of the inner exponential are inde- 
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pendent of each other, given that ¢ and ¢; are known, we can average 
over independent variables first and then take the average of the result 
with respect to phase variables. So, 


P., < exp(-Ne a Blexp( Nikos )]-Eslexp(\Gé’sin ¢)] 


Enyfexp ( ran [Bg e= (- : rarcos(n)) 
11 1 
Bn) exp ( nasin(@))|} (28) 


where 6 and # are the real and imaginary parts of a uniformly 
distributed, complex-valued random variable. We now calculate the 
statistical averages in eq. (28): 


T,=E x = NS 2 
1 = &n,,| CXP bis Mirg}| = exp 9021 On 


L/2 
= E;{exp(—)décos ¢)} = ¥ cosh{ré’c(2i — 1)cos()}. 
i=l 











And, since 
2.72 ; L?—-1x 
Z x cosh{(2i — 1)x} < exp 3 a 
then, 
24,2 72 _ 
T. < exp ; - 2 5 : costo) 
Similarly, 
1Nee 2 





2 
T3 = Ep{exp(APé’sin(¢))} < exp a sink) | 


252 G2 2 
T= Bn ex (- > - narcosis) )} = of cost) 
2 
r5< Eng exp (x nasin(¢)} < exp Bee see 
11 


Therefore, 
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etl cox (er Mio? , Mok, Neos 
ond 1 = exP |" 3 9 * 902, * “oa3, 


1 B-1#e2 2 
a= om é = + . 29 
P., < exp Ac + Aa fer + 3 Qe * dak, (29) 


and 














If we repeat the derivation for P., = P,{|z,;| > c}, because of the 
symmetry, we will find out that the result is the same as for P,,; i.e., 
E-1 
Pe = Gy (Pe, + Pe,)- 
We now calculate the least upper bound on P., by minimizing the 
argument of the exponential in eq. (29) with respect to ». The result 
for 


c 








ise et eee 

| bende = te _ 

2 Te ae 

ai 3 on Qi 

is 
ae ee | 3 ylan |? 
< a 3 

LOOP) a=) 14+ yen Pe 20) 


For channel 2 we find a similar result using | a2|? in eq. (30) instead 
of | Qy1 ee 


APPENDIX B 


Derivation of Error Bound for the LMS Canceler 


The derivation of error bound is somewhat tedious in this case. We 
employ the decision variables of eq. (22) of Section 4.1 and after some 
mathematical manipulations, find their real and imaginary parts. For 
example, for channel 1, by introducing 


vy=|ay,| and ¢=¢,+ ¢2 (31) 
en see a Aa Phe OR. 4s Das 
— E-nyy + F-nop — G-no] 
a= ines 5 AiBy 4 Bebe 4 Cots & Derg 


+ E-nyp + F-ng + G-nopl, 
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where 
A = —(v’o2 + vo? + of) 
B = £v’o2[cos ¢; + cos ¢y] 
C = év’o2[sin ¢1 — sin ¢9] 
D= vr + voz — vécos o 
E=/,/ sin ¢ 
F = —v°£ cos $1 + [v°£ + véozZ]cos ¢2 
G = -r°£ sin ¢; — [VE + véo?]sin do 
H = [v’& + v? + of)? — 2v4é?[1 + cos 9]. (32) 


In a similar manner as in Appendix A, we find an upper bound on 
P., = P,{|Zir| > c} using the Chernoff bound. 
Following the method used in Appendix A, we define 


/2 
To = Ba,\exp (. (4) )t = 7 > cosh p (4) (2i = ve} 


Ne? 2 —1 (A\ 
cc a ae 77 a 











Similarly, 
T <e Ne al B ° 
casa ame a | 
and 
ts < emp ROH =1(C} 
a A 
Also, 


Similarly, 


2 , (EY 
ron [Fa (F) 
2 , (FY 
rewon [Fa (F) 
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so 


and 


2 2 
I's = exp E oF (2) i 


Therefore, using the Chernoff bound, 


2,2 L? eee | AZ + B? 2 
P., = Bex Ee + ua iba eo aos 





2 3 H? 
VY ,D+R+P+G@4 
a || 


where E3{-} is the expectation with respect to ¢. We can minimize the 
argument of the exp{-} with respect to >. The least upper bound 
corresponds to 


2 rae rV(¢) 
is Bex | 2(L? — 1) 704) + wale Se) 


This bound is conditioned on ¢, and ¢2, so by taking the average over 
¢, and @¢2, the actual bound can be obtained. In eq. (33) 
Q(¢) = A? + B?+C 
A(¢) = D? + E? + F? + G? 
V(¢) = H?. 
Hence, 
O(d) = (v202 + v2 + 04)? + 28 v*to4(1 + cos ¢) 
A(o) = v?(vt + of + Qv?o2 + vitt*t + vite? + vite (34) 
+ Qy*&4o2 + Fok) — Qv*e [ve + 202 + v*Jcos ¢ 
V(o) = {(v? + v?£? + 02)? — Qv*t(1 + cos @)}”. 


Since ¢; and ¢. are two independent random variables that are 
uniformly distributed over (—z, 7) and ¢ = ¢; + ¢2, the probability 
density function of ¢ is 


1 1 
fal) = = (1 - £101), O<|d1<2n (85) 
wT Qn 


By using f»(), we can calculate the statistical average of the right- 
hand side of the bound in eq. (33) and find the least upper bound on 
P,,. Again, by symmetry 


P., = P,{|Zur| > c} = Pe 


1? 
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SO 
T-1 bet 


Pos (Pat Pale Pe 
and 
L-1 { aan. 
a L 0 7 ( 2) 
_ 3 VG) 
Bi 2(L? — 1) O(¢) + Lot a0) 2) 


2h. 2. 2 
in which y = a 7 and V(¢), 0(¢), and A(@) are defined in eq. 


(34). 


APPENDIX C 
Derivation of Error Bound for the Baseline System 


If we use a similar approach, 
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Therefore, 
(L — 1) 


P. = = exp(—Ac) -E {Ti -T: T'3} 


or 








- : 1v-i 
P. < irae exp re + ee ata ae vee| (39) 


By minimizing the argument of exp[-] with respect to » and substi- 
tuting Amin in eq. (37), the corresponding least upper bound is 
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and, similarly, the probability of error for channel 2 can be obtained 
by substituting | a22|? instead of | a1, |? in eq. (40). 
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